[cfe-commits] r68076 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp test/Sema/ucn-cstring.c

Tue Mar 31 11:10:05 PDT 2009

On Mar 31, 2009, at 1:52 PM, Chris Lattner wrote:

>
> On Mar 30, 2009, at 4:46 PM, Steve Naroff wrote:
>
>> Author: snaroff
>> Date: Mon Mar 30 18:46:03 2009
>> New Revision: 68076
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=68076&view=rev
>> Log:
>> Implement UCN support for C string literals (C99 6.4.3) and add  
>> some very basic tests. Chris Goller has graciously offered to write  
>> some test to help validate UCN support.
>
> Very nice Steve, this turned out quite elegant!
>

I was glad to avoid using Unix iconv (which is heavy) or copying all  
of http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c.

Separating ProcessUCNEscape from ProcessCharEscape also helped. I  
wanted to avoid disrupting/complicating ProcessCharEscape.

As we add more support for UCN's, I'm sure things will get more  
complex. For example, the code generation for @-strings requires we  
convert from UTF-8 to UTF-16.

We can grow this as needed...

> Some more minor things:
>
>>
>> -
>> +/// ProcessUCNEscape - Read the Universal Character Name, check  
>> constraints and
>> +/// convert the UTF32 to UTF8. This is a subroutine of  
>> StringLiteralParser.
>> +/// When we decide to implement UCN's for character constants and  
>> identifiers,
>> +/// we will likely rework our support for UCN's.
>> +static void ProcessUCNEscape(const char *&ThisTokBuf, const char  
>> *ThisTokEnd,
>> +                             char *&ResultBuf, const char  
>> *ResultBufEnd,
>> +                             bool &HadError,
>> +                             SourceLocation Loc, Preprocessor &PP) {
>
>>
>> +  typedef unsigned int UTF32;
>
> How about typedef uint32_t UTF32?

Sure...

>
>
>> +  // If we didn't consume the proper number of digits, there is a  
>> problem.
>> +  if (UcnLen) {
>> +    PP.Diag(Loc, diag::err_ucn_escape_incomplete);
>
> Please use AdvanceToTokenCharacter so that the caret points exactly  
> to the right place in string diagnostics.  See  
> err_exponent_has_no_digits and friends for an example.
>

Sounds good...

>> +  // Now that we've parsed/checked the UCN, we convert from UTF32- 
>> >UTF8.
>> +  // The conversion below was inspired by:
>> +  //   http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
>> +  // First, we determine how many bytes the result will require.
>> +  typedef unsigned char UTF8;
>
> uint8_t ?

Sure...

>
>
>> +      if (ThisTokBuf[1] == 'u' || ThisTokBuf[1] == 'U') {
>> +        ProcessUCNEscape(ThisTokBuf, ThisTokEnd, ResultPtr,
>> +                         GetString() + ResultBuf.size(),
>> +                         hadError, StringToks[i].getLocation(), PP);
>
> In an effort to unnest the else block,  how about using a continue  
> here?
>

O.k.

>>
>> +int main(void) {
>> +  printf("%s (%d)\n", "hello \u2192 \u2603 \u2190 world",  
>> sizeof("hello \u2192 \u2603 \u2190 world"));
>> +  printf("%s (%d)\n", "\U00010400\U0001D12B",  
>> sizeof("\U00010400\U0001D12B"));
>> +  // Some error conditions...
>> +  printf("%s\n", "\U"); // expected-error{{\u used with no  
>> following hex digits}}
>> +  printf("%s\n", "\U00"); // expected-error{{incomplete universal  
>> character name}}
>> +  printf("%s\n", "\U0001"); // expected-error{{incomplete  
>> universal character name}}
>> +  printf("%s\n", "\u0001"); // expected-error{{invalid universal  
>> character}}
>
> Very nice!
>
> -Chris