[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Tue Jul 26 22:41:25 PDT 2011

On Jul 26, 2011, at 8:39 AM, Craig Topper wrote:

> Here's a new patch addressing Chris' suggestions.

Great! I've committed this as r136210, thanks so much!

	- Doug

> 
> On Mon, Jul 25, 2011 at 11:14 PM, Chris Lattner <clattner at apple.com> wrote:
>> 
>> On Jul 25, 2011, at 10:52 PM, Craig Topper wrote:
>> 
>>> Doh! I accidentally warned on UCNs larger than 0xffff in UTF-32
>>> character literals. This patch fixes that.
>> 
>> Hi Craig,
>> 
>> Doug is definitely the best one to review the semantics of this.  Some minor comments:
>> 
>>   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
>>     IdentifierInfo &II = get(Name);
>> +    assert(TokenCode < 512 && "TokenCode too large");
>>     II.TokenID = TokenCode;
>>     return II;
>> 
>> I would suggest instead:
>> 
>>   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
>>     IdentifierInfo &II = get(Name);
>>     II.TokenID = TokenCode;
>> +    assert(II.TokenID == TokenCode && "TokenCode too large");
>>     return II;
>> 
>> to avoid tying the '9' bit bitfield size to the magic 512 constant.
>> 
>> 
>>  class StringLiteral : public Expr {
>> ...
>>   unsigned ByteLength;
>> -  bool IsWide;
>> +  StringKind Kind;
>>   bool IsPascal;
>>   unsigned NumConcatenated;
>> 
>> sizeof(StringKind) is likely to be 4, wasting space.  I'd suggest making it an 8 bit bitfield or something.
>> 
>> 
>> In SemaDeclAttr etc:
>> 
>> -    if (Str == 0 || Str->isWide()) {
>> +    if (Str == 0 || Str->getKind() != StringLiteral::Ascii) {
>>       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
>>           << "weakref" << 1;
>> 
>> I'd suggest introducing the proper helper methods and using:
>> 
>> -    if (Str == 0 || Str->isWide()) {
>> +    if (Str == 0 || !Str->isAscii()) {
>>       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
>>           << "weakref" << 1;
>> 
>> In Lexer.cpp:
>> 
>> +    // treat U like the start of an identifier.
>> +    goto StartIdentifier;
>> 
>> Instead of doing this, I'd replace the gotos with just "return LexIdentifier(Result, CurPtr);" since "MIOpt.ReadToken" has already been done.
>> 
>> I would suggest fusing IsIdentifierL and IsIdentifierUTFStringPrefix into one function.
>> 
>> Otherwise, LGTM!
>> 
>> -Chris
>> 
>> 
>> 
> <unicode_strings.patch>