[cfe-commits] Patch for review - unicode step 1
AlisdairM(public)
public at alisdairm.net
Mon Jul 13 23:41:08 PDT 2009
> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com]
> Sent: 12 July 2009 23:44
> To: AlisdairM
> Cc: cfe-commits at cs.uiuc.edu
> Subject: Re: [cfe-commits] Patch for review - unicode step 1
>
> On Jul 10, 2009, at 1:30 PM, AlisdairM wrote:
> > Attached is a patch for step 1 of my Unicode and enhanced literals
> > support.
> > Looking for approval to commit.
> >
> > For step 1, I updated StringLiteralParser to cope with all varieties
> > of
> > prefix for a literal, but have not yet started work on suffices for
> > user-defined literals.
> > Note that although the logic is here, there is no change in
> > functionality
> > yet! The next step is to update the lexer to pass along these new
> > literals.
> >
> > As there is not supposed to be a change of functionality, there are
> > no tests
> > associated with this update.
>
> This looks pretty nice to me, just a couple of things:
>
> - // Remember if we see any wide strings.
> - AnyWide |= StringToks[i].is(tok::wide_string_literal);
> + // Check for concatenating literals of different types
> + ThisTokBuf = &TokenBuf[0];
> + unsigned ThisTokLen = PP.getSpelling(StringToks[i], ThisTokBuf);
> + PrefixCode = DecodeStringLiteralPrefix(ThisTokBuf) & 3;
>
> This can overflow the 'TokenBuf' buffer if the subsequent string was
> longer than the first string. Just make sure to use something like:
> TokenBuf.resize(StringToks[i].getLength());
>
> before you call PP.getSpelling there. This also generates a warning
> because "ThisTokLen" is dead.
Thanks.
> +/// u8 literals are stored as char strings, but cannot promote. u8
> can be
> +/// checked for by testing the 3rd bit of the result.
> +/// On return, the passed buffer pointer should be pointing at the
> initial
> +/// '"' character.
> +static unsigned int DecodeStringLiteralPrefix(char const *& TokenBuf)
> {
>
> Please use an enum to make the magic values explicit. Returning them
> as an 'unsigned' is still fine, but it would be nice to have:
>
> + unsigned int Representation = PrefixCode & PREFIX_IS_U8;
> instead of:
> + unsigned int Representation = PrefixCode & 3;
Agreed, I was being a little lazy while these constants were internal to the cpp and a single function - ducking the naming question till the next step where they will likely surface in the header as well, but I guess there is no excuse for that kind of laziness!
Will name appropriately before committing.
> Also, in DecodeStringLiteralPrefix, please use |= instead of += to
> emphasize that you're setting a bit (even though your code is
> functionally correct).
OK
> A very trivial point is that I generally prefer "unsigned" instead of
> "unsigned int", just because it is shorter and less redundant.
Conventions often appear trivial, but it is always better to follow along. Thanks for the pointer.
> + case 1: // char16_t
> + CodeUnitWidth = 2U; // FIXME : lookup correct width of char16_t
> for target
> + break;
> + case 2: // wchar_t
> + CodeUnitWidth = PP.getTargetInfo().getWCharWidth();
> + assert((CodeUnitWidth & 7) == 0 && "Assumes wchar_t is byte
> multiple!");
> + CodeUnitWidth /= 8;
> + break;
> + case 3: // char32_t
> + CodeUnitWidth = 4U; // FIXME : lookup correct width of char32_t
> for target
> + break;
>
> Please stub these to go through PP.getTargetInfo(), even if the
> methods on targetinfo are hard coded to return 2/4 for now. That puts
> the fixme in targetinfo, not in the lexer.
As I just committed the Unicode types patch, these can actually be the real calls now rather than stubs.
> - hadError, StringToks[i].getLocation(),
> ThisIsWide, PP);
> + hadError, StringToks[i].getLocation(),
> RepKind > 0, PP);
>
> it would be nice to use something less magic than "RepKind > 0" for
> this. Maybe using enums can help.
I should probably be passing the actual bit-widths. Revised patch this evening.
> Thanks for working on this Alisdair!
Thanks for the review.
I'll try and make the updates tonight (just starting a fresh day here) so I can start hooking up the new character types to their literals tomorrow.
AlisdairM
More information about the cfe-commits
mailing list