[cfe-commits] r159223 - in /cfe/trunk: include/clang-c/ include/clang/AST/ lib/AST/ test/Index/ tools/c-index-test/ tools/libclang/ unittests/ unittests/AST/
Dmitri Gribenko
gribozavr at gmail.com
Wed Jun 27 09:33:06 PDT 2012
Hi Jordan,
Thank you for review.
On Tue, Jun 26, 2012 at 5:18 PM, Jordan Rose <jordan_rose at apple.com> wrote:
> Hi, Dmitri. Sorry for not pre-commit reviewing.
>
> On Jun 26, 2012, at 13:39 , Dmitri Gribenko <gribozavr at gmail.com> wrote:
>
>> + /// Contains text value associated with a token.
>> + const char *TextPtr1;
>> + unsigned TextLen1;
>> +
>> + /// Contains text value associated with a token.
>> + const char *TextPtr2;
>> + unsigned TextLen2;
>
> Why not StringRefs?
Per Doug's comment:
> + /// Contains text value associated with a token.
> + StringRef TextValue1;
> +
> + /// Contains text value associated with a token.
> + StringRef TextValue2;
>
> It would be nice if Token were a POD type. Could we just store ptr/length for TextValue1 and TextValue2, so that Token remains a POD?
>> +std::string BriefParser::Parse() {
>> + std::string FirstParagraph;
>> + std::string Brief;
>> + bool InFirstParagraph = true;
>> + bool InBrief = false;
>> + bool BriefDone = false;
>> +
>> + while (Tok.isNot(tok::eof)) {
>> + if (Tok.is(tok::text)) {
>> + if (InFirstParagraph)
>> + FirstParagraph += Tok.getText();
>> + if (InBrief)
>> + Brief += Tok.getText();
>> + ConsumeToken();
>> + continue;
>> + }
>> +
>> + if (!BriefDone && Tok.is(tok::command) && Tok.getCommandName() == "brief") {
>> + InBrief = true;
>> + ConsumeToken();
>> + continue;
>> + }
>> +
>> + if (Tok.is(tok::newline)) {
>> + if (InFirstParagraph)
>> + FirstParagraph += '\n';
>> + if (InBrief)
>> + Brief += '\n';
>> + ConsumeToken();
>> +
>> + if (Tok.is(tok::newline)) {
>> + ConsumeToken();
>> + // We found a paragraph end.
>> + InFirstParagraph = false;
>> + if (InBrief) {
>> + InBrief = false;
>> + BriefDone = true;
>> + }
>> + }
>> + continue;
>> + }
>> +
>> + // We didn't handle this token, so just drop it.
>> + ConsumeToken();
>> + }
>> +
>> + if (Brief.size() > 0)
>> + return Brief;
>> +
>> + return FirstParagraph;
>> +}
>
> I think you can do this with one string. Once you have at least one \brief command, you don't need to track the first paragraph anymore.
Thanks, this simplifies logic, r159247.
> And is it really valid to have more than one \brief command, or can we bail out of this loop early if we find a complete \brief?
Although Doxygen docs state that multiple \brief paragraph would be
merged, I cannot reproduce this behavior (it only uses the first one).
>> +/// Return the one past end pointer for C comments.
>> +/// Very dumb, does not handle escaped newlines or trigraphs.
>> +const char *findCCommentEnd(const char *BufferPtr, const char *BufferEnd) {
>> + for ( ; BufferPtr != BufferEnd; ++BufferPtr) {
>> + if (*BufferPtr == '*') {
>> + assert(BufferPtr + 1 != BufferEnd);
>> + if (*(BufferPtr + 1) == '/')
>> + return BufferPtr;
>> + }
>> + }
>> + llvm_unreachable("buffer end hit before '*/' was seen");
>> +}
>
> What happens in a file with unterminated comments? What happens in a file with unterminated comments where the very last character is '*'?
>
> (If the answer is "this function will never be called", I'm okay with that.)
An unterminated comment should not be passed from lexer to anyone.
I've added a test for that in r159267.
>> +
>> +void Lexer::lexCommentText(Token &T) {
>> + assert(CommentState == LCS_InsideBCPLComment ||
>> + CommentState == LCS_InsideCComment);
>> +
>> + switch (State) {
>> + case LS_Normal:
>> + break;
>> + case LS_VerbatimBlockFirstLine:
>> + lexVerbatimBlockFirstLine(T);
>> + return;
>> + case LS_VerbatimBlockBody:
>> + lexVerbatimBlockBody(T);
>> + return;
>> + case LS_HTMLOpenTag:
>> + lexHTMLOpenTag(T);
>> + return;
>> + }
>
> This seems rather asymmetrical. Can we just factor normal mode into its own function as well?
Normal state for lexing text is the hot path, I wanted to avoid a
function call here. I don't really have a preference, though, because
I didn't do any performance measurements yet.
>> + assert(State == LS_Normal);
>> +
>> + const char *TokenPtr = BufferPtr;
>> + assert(TokenPtr < CommentEnd);
>> + while (TokenPtr != CommentEnd) {
>> + switch(*TokenPtr) {
>> + case '\\':
>> + case '@': {
>> + TokenPtr++;
>> + if (TokenPtr == CommentEnd) {
>> + formTokenWithChars(T, TokenPtr, tok::text);
>> + T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
>> + return;
>> + }
>
> This looks funny. Can you just keep a "frame pointer" to where BufferPtr was at the beginning of the function, rather than relying on it being updated by formTokenWithChars and then counting backwards?
I don't have a preference here. Done in r159269.
> Also, when the token body is trivially extractable from the entire token text, I'm not sure it's worth saving separately.)
I think that it is useful because the extracted text is the semantic
value of the token, which is used a lot in the parser and sema. I
don't think it is a good idea to require them to strip extra
characters -- that's what the lexer is for.
>> + default: {
>> + while (true) {
>> + TokenPtr++;
>> + if (TokenPtr == CommentEnd)
>> + break;
>> + char C = *TokenPtr;
>> + if(C == '\n' || C == '\r' ||
>> + C == '\\' || C == '@' || C == '<')
>> + break;
>> + }
>> + formTokenWithChars(T, TokenPtr, tok::text);
>> + T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
>> + return;
>> + }
>
> Should this function be the place where leading stars are removed in C-style comments?
We can remove them there, but I would prefer to do this while
post-processing comment text.
Dmitri
--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
More information about the cfe-commits
mailing list