[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Wed Jul 27 08:22:49 PDT 2011

On Jul 26, 2011, at 11:07 PM, Craig Topper wrote:

> Here's the patch to update the support page.

r136216, thanks!

	- Doug

> On Tue, Jul 26, 2011 at 10:41 PM, Douglas Gregor <dgregor at apple.com> wrote:
>> 
>> On Jul 26, 2011, at 8:39 AM, Craig Topper wrote:
>> 
>>> Here's a new patch addressing Chris' suggestions.
>> 
>> Great! I've committed this as r136210, thanks so much!
>> 
>>        - Doug
>> 
>>> 
>>> On Mon, Jul 25, 2011 at 11:14 PM, Chris Lattner <clattner at apple.com> wrote:
>>>> 
>>>> On Jul 25, 2011, at 10:52 PM, Craig Topper wrote:
>>>> 
>>>>> Doh! I accidentally warned on UCNs larger than 0xffff in UTF-32
>>>>> character literals. This patch fixes that.
>>>> 
>>>> Hi Craig,
>>>> 
>>>> Doug is definitely the best one to review the semantics of this.  Some minor comments:
>>>> 
>>>>   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
>>>>     IdentifierInfo &II = get(Name);
>>>> +    assert(TokenCode < 512 && "TokenCode too large");
>>>>     II.TokenID = TokenCode;
>>>>     return II;
>>>> 
>>>> I would suggest instead:
>>>> 
>>>>   IdentifierInfo &get(StringRef Name, tok::TokenKind TokenCode) {
>>>>     IdentifierInfo &II = get(Name);
>>>>     II.TokenID = TokenCode;
>>>> +    assert(II.TokenID == TokenCode && "TokenCode too large");
>>>>     return II;
>>>> 
>>>> to avoid tying the '9' bit bitfield size to the magic 512 constant.
>>>> 
>>>> 
>>>>  class StringLiteral : public Expr {
>>>> ...
>>>>   unsigned ByteLength;
>>>> -  bool IsWide;
>>>> +  StringKind Kind;
>>>>   bool IsPascal;
>>>>   unsigned NumConcatenated;
>>>> 
>>>> sizeof(StringKind) is likely to be 4, wasting space.  I'd suggest making it an 8 bit bitfield or something.
>>>> 
>>>> 
>>>> In SemaDeclAttr etc:
>>>> 
>>>> -    if (Str == 0 || Str->isWide()) {
>>>> +    if (Str == 0 || Str->getKind() != StringLiteral::Ascii) {
>>>>       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
>>>>           << "weakref" << 1;
>>>> 
>>>> I'd suggest introducing the proper helper methods and using:
>>>> 
>>>> -    if (Str == 0 || Str->isWide()) {
>>>> +    if (Str == 0 || !Str->isAscii()) {
>>>>       S.Diag(Attr.getLoc(), diag::err_attribute_argument_n_not_string)
>>>>           << "weakref" << 1;
>>>> 
>>>> In Lexer.cpp:
>>>> 
>>>> +    // treat U like the start of an identifier.
>>>> +    goto StartIdentifier;
>>>> 
>>>> Instead of doing this, I'd replace the gotos with just "return LexIdentifier(Result, CurPtr);" since "MIOpt.ReadToken" has already been done.
>>>> 
>>>> I would suggest fusing IsIdentifierL and IsIdentifierUTFStringPrefix into one function.
>>>> 
>>>> Otherwise, LGTM!
>>>> 
>>>> -Chris
>>>> 
>>>> 
>>>> 
>>> <unicode_strings.patch>
>> 
>> 
> 
> 
> 
> -- 
> ~Craig
> <unicode_support.patch>