[cfe-dev] Review request: Patch for adding raw string support to clang
Cédric Venet
cedric.venet at laposte.net
Sat Jun 28 01:58:40 PDT 2008
Hi,
Eli Friedman a écrit :
> On Thu, Jun 26, 2008 at 6:05 AM, Cédric Venet <cedric.venet at laposte.net> wrote:
>
>> DIAG(err_unterminated_string, ERROR,
>> "missing terminating \" character")
>> +DIAG(err_too_much_dchars_rawstring, ERROR,
>> + "The d-chars part of the string is too long (16 max)")
>>
>
> Is a separate diagnostic for a missing d-chars sequence at the end necessary?
>
You can't tell if it is a missing dchar sequence or if it is just part
of the string, so I don't see how a dignostic could be emitted. Consider:
R"abc[abc]ab]"c]abc" => legal string "abc]ab]\"c" (in clasical form)
>> +
>> + char dchar_seq[16];
>> + int dchar_seq_len = 0;
>> +
>> + // first read the optional d-char-sequence (0 to 16 characters)
>> + char C = getAndAdvanceChar(CurPtr, Result);
>> +
>> + while (C != '[') {
>> + // FIXME: check the characters are in the allowed set
>> + if(dchar_seq_len>=16) {
>> + Diag(BufferPtr, diag::err_too_much_dchars_rawstring);
>> + // TO REVIEW: should we attempt to recuperate on error here?
>>
>
> It would be ideal if the code supported more than 16 chars, but
> otherwise lexing is essentially screwed anyway. Anything sane is
> fine.
>
I didn't choose 16, it comes from the C++ proposal (accepted):
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm:
"A /d-char-sequence/ shall consist of at most 16 characters."
It would be hard to lift this limitation but, since this is in the
standard, I think it is better like this?
>> + Result.setKind(tok::unknown);
>> + FormTokenWithChars(Result, CurPtr-1);
>> + return;
>> + }
>> + dchar_seq[dchar_seq_len++] = C;
>> + C = getAndAdvanceChar(CurPtr, Result);
>>
>
> Don't you need to check for EOF/embedded null here?
>
yes, done.
>
>> + }
>> + // skip the '['
>> + C = getAndAdvanceChar(CurPtr, Result);
>> + while(1) {
>> + while (C != ']') {
>> + //if( a backslash \ followed by a u or U
>>
>
> Mmm, definitely need to handle this case. And also the escaped-newline case.
>
This is handled latter in fact, in
StringLiteralParser::StringLiteralParser (litteralSupport.cpp). In fact,
since the UCN aren't handled in the standard string, I didn't handle
them in raw string at the moment. it assert if it encouter them.
>
>> + // FIXME: what about raw string and trigraph, escaped end of line ...??
>>
>
> I don't follow; would these somehow make the string grow? Or is there
> some other issue?
No, I was wondering what to do of a trigaph or an escaped eol in a raw
string? translate them like it is actually done or let them as is. I
don't know the preprocessor and lexer phases so I am not sure what is
the correct way. I think this is ok, so I deleted the comment.
>
>> + case tok::raw_string_literal: // TO REVIEW: objC and C++0x possible?
>>
>
> What's the question?
>
I was wondering if we could encounter a C++0x specific token in an objC
parser function... Anyways, adding the case is the safe alternative and
don't cost much.
> Note that this isn't a complete review.
>
> -Eli
>
Thanks for taking the time. I attached a modified patch. And I changed
the option for thunderbird, so it should be correctly attached.
--
Cédric
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rawstringliteral_review2.patch
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080628/8122efba/attachment.ksh>
More information about the cfe-dev
mailing list