[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Mon Jul 25 23:14:03 PDT 2011

On Jul 25, 2011, at 10:52 PM, Craig Topper wrote:

> Doh! I accidentally warned on UCNs larger than 0xffff in UTF-32
> character literals. This patch fixes that.

Hi Craig,

Doug is definitely the best one to review the semantics of this.  Some minor comments:

   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
     IdentifierInfo &II = get(Name);
+    assert(TokenCode < 512 && "TokenCode too large");
     II.TokenID = TokenCode;
     return II;

I would suggest instead:

   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
     IdentifierInfo &II = get(Name);
     II.TokenID = TokenCode;
+    assert(II.TokenID == TokenCode && "TokenCode too large");
     return II;

to avoid tying the '9' bit bitfield size to the magic 512 constant.

 class StringLiteral : public Expr {
...
   unsigned ByteLength;
-  bool IsWide;
+  StringKind Kind;
   bool IsPascal;
   unsigned NumConcatenated;

sizeof(StringKind) is likely to be 4, wasting space.  I'd suggest making it an 8 bit bitfield or something.

In SemaDeclAttr etc:

-    if (Str == 0 || Str->isWide()) {
+    if (Str == 0 || Str->getKind() != StringLiteral::Ascii) {
       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
           << "weakref" << 1;

I'd suggest introducing the proper helper methods and using:

-    if (Str == 0 || Str->isWide()) {
+    if (Str == 0 || !Str->isAscii()) {
       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
           << "weakref" << 1;

In Lexer.cpp:

+    // treat U like the start of an identifier.
+    goto StartIdentifier;

Instead of doing this, I'd replace the gotos with just "return LexIdentifier(Result, CurPtr);" since "MIOpt.ReadToken" has already been done.

I would suggest fusing IsIdentifierL and IsIdentifierUTFStringPrefix into one function.

Otherwise, LGTM!

-Chris