[cfe-commits] r68076 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp test/Sema/ucn-cstring.c

Tue Mar 31 10:52:28 PDT 2009

On Mar 30, 2009, at 4:46 PM, Steve Naroff wrote:

> Author: snaroff
> Date: Mon Mar 30 18:46:03 2009
> New Revision: 68076
>
> URL: http://llvm.org/viewvc/llvm-project?rev=68076&view=rev
> Log:
> Implement UCN support for C string literals (C99 6.4.3) and add some  
> very basic tests. Chris Goller has graciously offered to write some  
> test to help validate UCN support.

Very nice Steve, this turned out quite elegant!

Some more minor things:

>
> -
> +/// ProcessUCNEscape - Read the Universal Character Name, check  
> constraints and
> +/// convert the UTF32 to UTF8. This is a subroutine of  
> StringLiteralParser.
> +/// When we decide to implement UCN's for character constants and  
> identifiers,
> +/// we will likely rework our support for UCN's.
> +static void ProcessUCNEscape(const char *&ThisTokBuf, const char  
> *ThisTokEnd,
> +                             char *&ResultBuf, const char  
> *ResultBufEnd,
> +                             bool &HadError,
> +                             SourceLocation Loc, Preprocessor &PP) {

>
> +  typedef unsigned int UTF32;

How about typedef uint32_t UTF32?

> +  // If we didn't consume the proper number of digits, there is a  
> problem.
> +  if (UcnLen) {
> +    PP.Diag(Loc, diag::err_ucn_escape_incomplete);

Please use AdvanceToTokenCharacter so that the caret points exactly to  
the right place in string diagnostics.  See err_exponent_has_no_digits  
and friends for an example.

> +  // Now that we've parsed/checked the UCN, we convert from UTF32- 
> >UTF8.
> +  // The conversion below was inspired by:
> +  //   http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
> +  // First, we determine how many bytes the result will require.
> +  typedef unsigned char UTF8;

uint8_t ?

> +      if (ThisTokBuf[1] == 'u' || ThisTokBuf[1] == 'U') {
> +        ProcessUCNEscape(ThisTokBuf, ThisTokEnd, ResultPtr,
> +                         GetString() + ResultBuf.size(),
> +                         hadError, StringToks[i].getLocation(), PP);

In an effort to unnest the else block,  how about using a continue here?

>
> +int main(void) {
> +  printf("%s (%d)\n", "hello \u2192 \u2603 \u2190 world",  
> sizeof("hello \u2192 \u2603 \u2190 world"));
> +  printf("%s (%d)\n", "\U00010400\U0001D12B",  
> sizeof("\U00010400\U0001D12B"));
> +  // Some error conditions...
> +  printf("%s\n", "\U"); // expected-error{{\u used with no  
> following hex digits}}
> +  printf("%s\n", "\U00"); // expected-error{{incomplete universal  
> character name}}
> +  printf("%s\n", "\U0001"); // expected-error{{incomplete universal  
> character name}}
> +  printf("%s\n", "\u0001"); // expected-error{{invalid universal  
> character}}

Very nice!

-Chris