[cfe-commits] r68076 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp test/Sema/ucn-cstring.c
steve naroff
snaroff at apple.com
Tue Mar 31 11:10:05 PDT 2009
On Mar 31, 2009, at 1:52 PM, Chris Lattner wrote:
>
> On Mar 30, 2009, at 4:46 PM, Steve Naroff wrote:
>
>> Author: snaroff
>> Date: Mon Mar 30 18:46:03 2009
>> New Revision: 68076
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=68076&view=rev
>> Log:
>> Implement UCN support for C string literals (C99 6.4.3) and add
>> some very basic tests. Chris Goller has graciously offered to write
>> some test to help validate UCN support.
>
> Very nice Steve, this turned out quite elegant!
>
I was glad to avoid using Unix iconv (which is heavy) or copying all
of http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c.
Separating ProcessUCNEscape from ProcessCharEscape also helped. I
wanted to avoid disrupting/complicating ProcessCharEscape.
As we add more support for UCN's, I'm sure things will get more
complex. For example, the code generation for @-strings requires we
convert from UTF-8 to UTF-16.
We can grow this as needed...
> Some more minor things:
>
>>
>> -
>> +/// ProcessUCNEscape - Read the Universal Character Name, check
>> constraints and
>> +/// convert the UTF32 to UTF8. This is a subroutine of
>> StringLiteralParser.
>> +/// When we decide to implement UCN's for character constants and
>> identifiers,
>> +/// we will likely rework our support for UCN's.
>> +static void ProcessUCNEscape(const char *&ThisTokBuf, const char
>> *ThisTokEnd,
>> + char *&ResultBuf, const char
>> *ResultBufEnd,
>> + bool &HadError,
>> + SourceLocation Loc, Preprocessor &PP) {
>
>>
>> + typedef unsigned int UTF32;
>
> How about typedef uint32_t UTF32?
Sure...
>
>
>> + // If we didn't consume the proper number of digits, there is a
>> problem.
>> + if (UcnLen) {
>> + PP.Diag(Loc, diag::err_ucn_escape_incomplete);
>
> Please use AdvanceToTokenCharacter so that the caret points exactly
> to the right place in string diagnostics. See
> err_exponent_has_no_digits and friends for an example.
>
Sounds good...
>> + // Now that we've parsed/checked the UCN, we convert from UTF32-
>> >UTF8.
>> + // The conversion below was inspired by:
>> + // http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
>> + // First, we determine how many bytes the result will require.
>> + typedef unsigned char UTF8;
>
> uint8_t ?
Sure...
>
>
>> + if (ThisTokBuf[1] == 'u' || ThisTokBuf[1] == 'U') {
>> + ProcessUCNEscape(ThisTokBuf, ThisTokEnd, ResultPtr,
>> + GetString() + ResultBuf.size(),
>> + hadError, StringToks[i].getLocation(), PP);
>
> In an effort to unnest the else block, how about using a continue
> here?
>
O.k.
>>
>> +int main(void) {
>> + printf("%s (%d)\n", "hello \u2192 \u2603 \u2190 world",
>> sizeof("hello \u2192 \u2603 \u2190 world"));
>> + printf("%s (%d)\n", "\U00010400\U0001D12B",
>> sizeof("\U00010400\U0001D12B"));
>> + // Some error conditions...
>> + printf("%s\n", "\U"); // expected-error{{\u used with no
>> following hex digits}}
>> + printf("%s\n", "\U00"); // expected-error{{incomplete universal
>> character name}}
>> + printf("%s\n", "\U0001"); // expected-error{{incomplete
>> universal character name}}
>> + printf("%s\n", "\u0001"); // expected-error{{invalid universal
>> character}}
>
> Very nice!
>
> -Chris
More information about the cfe-commits
mailing list