[cfe-commits] Patch for review - unicode step 1

Mon Jul 13 23:41:08 PDT 2009

> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com]
> Sent: 12 July 2009 23:44
> To: AlisdairM
> Cc: cfe-commits at cs.uiuc.edu
> Subject: Re: [cfe-commits] Patch for review - unicode step 1
> 
> On Jul 10, 2009, at 1:30 PM, AlisdairM wrote:
> > Attached is a patch for step 1 of my Unicode and enhanced literals
> > support.
> > Looking for approval to commit.
> >
> > For step 1, I updated StringLiteralParser to cope with all varieties
> > of
> > prefix for a literal, but have not yet started work on suffices for
> > user-defined literals.
> > Note that although the logic is here, there is no change in
> > functionality
> > yet!  The next step is to update the lexer to pass along these new
> > literals.
> >
> > As there is not supposed to be a change of functionality, there are
> > no tests
> > associated with this update.
> 
> This looks pretty nice to me, just a couple of things:
> 
> -    // Remember if we see any wide strings.
> -    AnyWide |= StringToks[i].is(tok::wide_string_literal);
> +    // Check for concatenating literals of different types
> +    ThisTokBuf = &TokenBuf[0];
> +    unsigned ThisTokLen = PP.getSpelling(StringToks[i], ThisTokBuf);
> +    PrefixCode = DecodeStringLiteralPrefix(ThisTokBuf) & 3;
> 
> This can overflow the 'TokenBuf' buffer if the subsequent string was
> longer than the first string.  Just make sure to use something like:
>    TokenBuf.resize(StringToks[i].getLength());
> 
> before you call PP.getSpelling there.  This also generates a warning
> because "ThisTokLen" is dead.

Thanks.

> +/// u8 literals are stored as char strings, but cannot promote.  u8
> can be
> +/// checked for by testing the 3rd bit of the result.
> +/// On return, the passed buffer pointer should be pointing at the
> initial
> +/// '"' character.
> +static unsigned int DecodeStringLiteralPrefix(char const *& TokenBuf)
> {
> 
> Please use an enum to make the magic values explicit.  Returning them
> as an 'unsigned' is still fine, but it would be nice to have:
> 
> +  unsigned int Representation = PrefixCode & PREFIX_IS_U8;
> instead of:
> +  unsigned int Representation = PrefixCode & 3;

Agreed, I was being a little lazy while these constants were internal to the cpp and a single function - ducking the naming question till the next step where they will likely surface in the header as well, but I guess there is no excuse for that kind of laziness!

Will name appropriately before committing.

> Also, in DecodeStringLiteralPrefix, please use |= instead of += to
> emphasize that you're setting a bit (even though your code is
> functionally correct).

OK

> A very trivial point is that I generally prefer "unsigned" instead of
> "unsigned int", just because it is shorter and less redundant.

Conventions often appear trivial, but it is always better to follow along.  Thanks for the pointer.

> +  case 1:  // char16_t
> +    CodeUnitWidth = 2U; // FIXME : lookup correct width of char16_t
> for target
> +    break;
> +  case 2:  // wchar_t
> +    CodeUnitWidth = PP.getTargetInfo().getWCharWidth();
> +    assert((CodeUnitWidth & 7) == 0 && "Assumes wchar_t is byte
> multiple!");
> +    CodeUnitWidth /= 8;
> +    break;
> +  case 3:  // char32_t
> +    CodeUnitWidth = 4U; // FIXME : lookup correct width of char32_t
> for target
> +    break;
> 
> Please stub these to go through PP.getTargetInfo(), even if the
> methods on targetinfo are hard coded to return 2/4 for now.  That puts
> the fixme in targetinfo, not in the lexer.

As I just committed the Unicode types patch, these can actually be the real calls now rather than stubs.

> -                         hadError, StringToks[i].getLocation(),
> ThisIsWide, PP);
> +                         hadError, StringToks[i].getLocation(),
> RepKind > 0, PP);
> 
> it would be nice to use something less magic than "RepKind > 0" for
> this.  Maybe using enums can help.

I should probably be passing the actual bit-widths.  Revised patch this evening.

> Thanks for working on this Alisdair!

Thanks for the review.
I'll try and make the updates tonight (just starting a fresh day here) so I can start hooking up the new character types to their literals tomorrow.

AlisdairM