[cfe-dev] Review request: Patch for adding raw string support to clang

Eli Friedman eli.friedman at gmail.com
Thu Jun 26 18:53:33 PDT 2008


On Thu, Jun 26, 2008 at 6:05 AM, Cédric Venet <cedric.venet at laposte.net> wrote:
>  DIAG(err_unterminated_string, ERROR,
>      "missing terminating \" character")
> +DIAG(err_too_much_dchars_rawstring, ERROR,
> +        "The d-chars part of the string is too long (16 max)")

Is a separate diagnostic for a missing d-chars sequence at the end necessary?

>  TOK(string_literal)      // "foo"
>  TOK(wide_string_literal) // L"foo"
>  TOK(angle_string_literal)// <foo>
> +TOK(raw_string_literal)  // R"**[foo]**" (N2442), support for u, U, u8 and
> L?

Put the question part of the comment on a separate line.

> +void Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr, bool
> Wide) {
> +/* TO REVIEW: should I let this comment here?

Yes, it's good to have the grammar in the code as a reference for what
the code is trying to do.

> +raw-string:
> +  "d-char-sequence opt [r-char-sequenceopt ]d-char-sequenceopt "
> +
> +r-char-sequence:
> +    r-char
> +    r-char-sequence r-char
> +
> +r-char:
> +    any member of the source character set, except, (1), a backslash \
> followed by a u or U, or, (2), a right square bracket ]
> +                        followed by the initial d-char-sequence (which may
> be empty) followed by a double quote ".
> +    universal-character-name
> +
> +d-char-sequence:
> +    d-char
> +    d-char-sequence d-char
> +
> +d-char:
> +    any member of the basic source character set, except space, the left
> square bracket [, the right square bracket ],
> +                        or the control characters representing horizontal
> tab, vertical tab, form feed, or new-line.
> +*/
> +
> +  char dchar_seq[16];
> +  int  dchar_seq_len = 0;
> +
> +  // first read the optional d-char-sequence (0 to 16 characters)
> +  char C = getAndAdvanceChar(CurPtr, Result);
> +
> +  while (C != '[') {
> +    // FIXME: check the characters are in the allowed set
> +    if(dchar_seq_len>=16) {
> +      Diag(BufferPtr, diag::err_too_much_dchars_rawstring);
> +      // TO REVIEW: should we attempt to recuperate on error here?

It would be ideal if the code supported more than 16 chars, but
otherwise lexing is essentially screwed anyway.  Anything sane is
fine.

> +      Result.setKind(tok::unknown);
> +      FormTokenWithChars(Result, CurPtr-1);
> +      return;
> +    }
> +    dchar_seq[dchar_seq_len++] = C;
> +    C = getAndAdvanceChar(CurPtr, Result);

Don't you need to check for EOF/embedded null here?

> +  }
> +  // skip the '['
> +  C = getAndAdvanceChar(CurPtr, Result);
> +  while(1) {
> +    while (C != ']') {
> +      //if( a backslash \ followed by a u or U

Mmm, definitely need to handle this case.  And also the escaped-newline case.

> +    // TO REVIEW: enable this only for C++0x, or any language with
> extension
> +    // activated? or add a features like the pascal string?

Let's stick with just C++0x mode for the moment; this has the
potential to break existing code that uses R as a macro.

> +// TO REVIEW: how to do this properly? if not followed by a ", need to
> +// unconsume the char, perhaps saving the curptr and sizetmp var is enough?
> +/*    else if(Char == 'R') {
> +      ConsumeChar(
> +      if(getCharAndSize(CurPtr,SizeTmp)=='"')
> +      return LexRawStringLiteral(Result, ConsumeChar(CurPtr, SizeTmp,
> Result),
> +                                 true);
> +*/

If I'm not mistaken, you can actually just fall through from here, and
it will still work correctly.

> +    // FIXME: what about raw string and trigraph, escaped end of line ...??

I don't follow; would these somehow make the string grow?  Or is there
some other issue?

>     // Check if this is a pascal string
>     if (pp.getLangOptions().PascalStrings && ThisTokBuf + 1 != ThisTokEnd &&
>         ThisTokBuf[0] == '\\' && ThisTokBuf[1] == 'p') {
> +      // TO REVIEW: has a pascal raw string a meaning?

I'd say don't make raw pascal strings; escapes aren't generally
accepted in raw strings, so allowing \p would be confusing.

> +  case tok::raw_string_literal: // TO REVIEW: objC and C++0x possible?

What's the question?

Note that this isn't a complete review.

-Eli




More information about the cfe-dev mailing list