[cfe-commits] Patch for review - unicode step 1
Chris Lattner
clattner at apple.com
Sun Jul 12 14:43:36 PDT 2009
On Jul 10, 2009, at 1:30 PM, AlisdairM wrote:
> Attached is a patch for step 1 of my Unicode and enhanced literals
> support.
> Looking for approval to commit.
>
> For step 1, I updated StringLiteralParser to cope with all varieties
> of
> prefix for a literal, but have not yet started work on suffices for
> user-defined literals.
> Note that although the logic is here, there is no change in
> functionality
> yet! The next step is to update the lexer to pass along these new
> literals.
>
> As there is not supposed to be a change of functionality, there are
> no tests
> associated with this update.
This looks pretty nice to me, just a couple of things:
- // Remember if we see any wide strings.
- AnyWide |= StringToks[i].is(tok::wide_string_literal);
+ // Check for concatenating literals of different types
+ ThisTokBuf = &TokenBuf[0];
+ unsigned ThisTokLen = PP.getSpelling(StringToks[i], ThisTokBuf);
+ PrefixCode = DecodeStringLiteralPrefix(ThisTokBuf) & 3;
This can overflow the 'TokenBuf' buffer if the subsequent string was
longer than the first string. Just make sure to use something like:
TokenBuf.resize(StringToks[i].getLength());
before you call PP.getSpelling there. This also generates a warning
because "ThisTokLen" is dead.
+/// u8 literals are stored as char strings, but cannot promote. u8
can be
+/// checked for by testing the 3rd bit of the result.
+/// On return, the passed buffer pointer should be pointing at the
initial
+/// '"' character.
+static unsigned int DecodeStringLiteralPrefix(char const *& TokenBuf) {
Please use an enum to make the magic values explicit. Returning them
as an 'unsigned' is still fine, but it would be nice to have:
+ unsigned int Representation = PrefixCode & PREFIX_IS_U8;
instead of:
+ unsigned int Representation = PrefixCode & 3;
Also, in DecodeStringLiteralPrefix, please use |= instead of += to
emphasize that you're setting a bit (even though your code is
functionally correct).
A very trivial point is that I generally prefer "unsigned" instead of
"unsigned int", just because it is shorter and less redundant.
+ case 1: // char16_t
+ CodeUnitWidth = 2U; // FIXME : lookup correct width of char16_t
for target
+ break;
+ case 2: // wchar_t
+ CodeUnitWidth = PP.getTargetInfo().getWCharWidth();
+ assert((CodeUnitWidth & 7) == 0 && "Assumes wchar_t is byte
multiple!");
+ CodeUnitWidth /= 8;
+ break;
+ case 3: // char32_t
+ CodeUnitWidth = 4U; // FIXME : lookup correct width of char32_t
for target
+ break;
Please stub these to go through PP.getTargetInfo(), even if the
methods on targetinfo are hard coded to return 2/4 for now. That puts
the fixme in targetinfo, not in the lexer.
- hadError, StringToks[i].getLocation(),
ThisIsWide, PP);
+ hadError, StringToks[i].getLocation(),
RepKind > 0, PP);
it would be nice to use something less magic than "RepKind > 0" for
this. Maybe using enums can help.
Thanks for working on this Alisdair!
-Chris
More information about the cfe-commits
mailing list