[cfe-dev] Review request: Patch for adding raw string support to clang

Mon Jun 30 22:28:53 PDT 2008

On Jun 28, 2008, at 1:58 AM, Cédric Venet wrote:

Hi Cédric,

I know very little about this feature and haven't studied the wording,  
but here are some thoughts.  This is actually a pretty nifty feature!

>>>
>>> +  case tok::raw_string_literal: // TO REVIEW: objC and C++0x  
>>> possible?
>>>
>>
>> What's the question?
>>
>
> I was wondering if we could encounter a C++0x specific token in an  
> objC parser function... Anyways, adding the case is the safe  
> alternative and don't cost much.

Yes, you definitely can.  The "Objective C++" language contains a  
superset of C++ and Objective C features.  However, I don't think we  
should allow @R"...", please do not accept this.

> Thanks for taking the time. I attached a modified patch. And I  
> changed the option for thunderbird, so it should be correctly  
> attached.

ok:

+DIAG(err_too_much_dchars_rawstring, ERROR,
+	 "The d-chars part of the string is too long (16 max)")

What are d-chars?  This is great for someone well versed in  
standardese, but not very useful for an end user.  How about "raw  
string prefix too long"?

+DIAG(err_invalid_dchar, ERROR,
+	 "The d-chars part of a raw string should not contain space, left  
square"
+     " bracket [, right square bracket ], horizontal tab, vertical  
tab,"
+     " form feed or new-line")

This is too long.  A diagnostic should be ~65 chars max with very few  
exceptions.

Also, diagnostics should not be sentences, they should be all lower  
case.


+TOK(raw_string_literal)  // R"**[foo]**" (N2442)

N2442 should be a more structured reference.  How about "C++ working  
paper N2442"?


+void Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,  
bool Wide) {
+/* from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm
+raw-string:
+  "d-char-sequence_opt[r-char-sequence_opt]d-char-sequence_opt"

The comment should go in a doxygen (///) comment above the method like  
other lexer methods.




+      // TO REVIEW: Is this the corect way to return from error?
+      // should we attempt to recuperate on error here?
+      Result.setKind(tok::unknown);
+      FormTokenWithChars(Result, CurPtr-1);
+      return;

This is the right thing to do.

+    } else if (C == ' ' || C == ']' || C == '\t' || C == '\n' || C ==  
'\f'
+               || C == '\v' || C == '\r') {
+      // TO REVIEW: how to set the error carret on the erroneous char?
+      Diag(BufferPtr, diag::err_invalid_dchar);

This should do the right thing, please write a testcase.  If this is  
one character *after* the intended one, just safe BufferPtr from  
before the call to getAndAdvanceChar and report it at that "const  
char*" location.

+      // TO REVIEW: Is this the corect way to return from error?
+      // should we attempt to recuperate on error here?

this is correct.



@@ -1294,6 +1399,18 @@
        return LexStringLiteral(Result, ConsumeChar(CurPtr, SizeTmp,  
Result),
                                true);

+    if(Char == 'R') {
+      const char* NextCharPtr=CurPtr+SizeTmp;
+      unsigned int NextSizeTmp;

This should have a comment explaining what it is:  e.g. LR"**[wide  
long string]**"

Please add a space after the 'if'.

+      if(NextChar=='"') {

This one too.

+        // TO REVIEW: do we need to consume each char of the token?
+        ConsumeChar(CurPtr, SizeTmp, Result);

Please write a testcase to verify :)

+        return LexRawStringLiteral(Result, ConsumeChar(NextCharPtr,  
NextSizeTmp,Result),
+                                   true);

80 columns.


+++ lib/Lex/LiteralSupport.cpp	(working copy)
@@ -608,7 +608,8 @@
        MaxTokenLength = StringToks[i].getLength();

      // Remember if we see any wide strings.
-    AnyWide |= StringToks[i].is(tok::wide_string_literal);
+    AnyWide |= StringToks[i].is(tok::wide_string_literal)


The grammar above the StringLiteralParser ctor should be updated.

It seems that the normal and raw string code in LiteralSupport are  
pretty distinct.  How about splitting them out into two separate  
methods of StringLiteralParser?



Please add several testcases for this feature to the clang testcase,  
verifying that the parser accepts valid cases and rejects invalid  
ones.  Also, please add a simple codegen test.  If you need help with  
these, please let me know.

-Chris