[cfe-dev] Review request: Patch for adding raw string support to clang

Sat Jun 28 01:58:40 PDT 2008

Hi,

Eli Friedman a écrit :
> On Thu, Jun 26, 2008 at 6:05 AM, Cédric Venet <cedric.venet at laposte.net> wrote:
>   
>>  DIAG(err_unterminated_string, ERROR,
>>      "missing terminating \" character")
>> +DIAG(err_too_much_dchars_rawstring, ERROR,
>> +        "The d-chars part of the string is too long (16 max)")
>>     
>
> Is a separate diagnostic for a missing d-chars sequence at the end necessary?
>   

You can't tell if it is a missing dchar sequence or if it is just part 
of the string, so I don't see how a dignostic could be emitted. Consider:
R"abc[abc]ab]"c]abc" => legal string "abc]ab]\"c" (in clasical form)

>> +
>> +  char dchar_seq[16];
>> +  int  dchar_seq_len = 0;
>> +
>> +  // first read the optional d-char-sequence (0 to 16 characters)
>> +  char C = getAndAdvanceChar(CurPtr, Result);
>> +
>> +  while (C != '[') {
>> +    // FIXME: check the characters are in the allowed set
>> +    if(dchar_seq_len>=16) {
>> +      Diag(BufferPtr, diag::err_too_much_dchars_rawstring);
>> +      // TO REVIEW: should we attempt to recuperate on error here?
>>     
>
> It would be ideal if the code supported more than 16 chars, but
> otherwise lexing is essentially screwed anyway.  Anything sane is
> fine.
>   

I didn't choose 16, it comes from the C++ proposal (accepted): 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm:
"A /d-char-sequence/ shall consist of at most 16 characters."
It would be hard to lift this limitation but, since this is in the 
standard, I think it is better like this?

>> +      Result.setKind(tok::unknown);
>> +      FormTokenWithChars(Result, CurPtr-1);
>> +      return;
>> +    }
>> +    dchar_seq[dchar_seq_len++] = C;
>> +    C = getAndAdvanceChar(CurPtr, Result);
>>     
>
> Don't you need to check for EOF/embedded null here?
>   

yes, done.

>   
>> +  }
>> +  // skip the '['
>> +  C = getAndAdvanceChar(CurPtr, Result);
>> +  while(1) {
>> +    while (C != ']') {
>> +      //if( a backslash \ followed by a u or U
>>     
>
> Mmm, definitely need to handle this case.  And also the escaped-newline case.
>   

This is handled latter in fact, in 
StringLiteralParser::StringLiteralParser (litteralSupport.cpp). In fact, 
since the UCN aren't handled in the standard string, I didn't handle 
them in raw string at the moment. it assert if it encouter them.

>   
>> +    // FIXME: what about raw string and trigraph, escaped end of line ...??
>>     
>
> I don't follow; would these somehow make the string grow?  Or is there
> some other issue?  

No, I was wondering what to do of a trigaph or an escaped eol in a raw 
string? translate them like it is actually done or let them as is. I 
don't know the preprocessor and lexer phases so I am not sure what is 
the correct way. I think this is ok, so I deleted the comment.

>   
>> +  case tok::raw_string_literal: // TO REVIEW: objC and C++0x possible?
>>     
>
> What's the question?
>   

I was wondering if we could encounter a C++0x specific token in an objC 
parser function... Anyways, adding the case is the safe alternative and 
don't cost much.

> Note that this isn't a complete review.
>
> -Eli
>   

Thanks for taking the time. I attached a modified patch. And I changed 
the option for thunderbird, so it should be correctly attached.

-- 
Cédric
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rawstringliteral_review2.patch
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080628/8122efba/attachment.ksh>