[cfe-dev] Wide strings and clang::StringLiteral.

Neil Booth neil at daikokuya.co.uk
Tue Dec 2 06:48:26 PST 2008


Jean-Daniel Dupas wrote:-

> I didn't know that C99 supports UCN in identifier.
> I don't see a lot of informations about it in the C99 spec (except that 
> UCN may appear in an identifier). Does this mean that this code is valid 
> ?
>
> ---------- test.c -------
>
> int main (int argc, char **argv) {
> 	int h\u00e9 = 0; // hé
> 	return he\u0301; // hé - using decomposed form
> }
> --------------------------
>
> Actually, GCC does not support combining character (like COMBINING ACUTE 
> ACCENT: 0x0301) :
>
> test.c:4:9: error: universal character \u0301 is not valid in an  
> identifier
> test.c: In function ‘main’:
> test.c:4: error: ‘hé’ undeclared (first use in this function)
> test.c:4: error: (Each undeclared identifier is reported only once
> test.c:4: error: for each function it appears in.)
>
> Note that the error is correctly displayed anyway.

My front end gives

$ ~/src/c/cfe /tmp/test.c 
"/tmp/test.c", line 3: error: universal character name "\u0301" cannot
       be used in an identifier
        return he\u0301; // hé - using decomposed form
               --^^^^^^                                

1 error found compiling "/tmp/bug.c".

So you've chosen an invalid UCN.  The standard lists the acceptable
UCNs; apparently this isn't one.

Just like "one" and "One" might be considered the same identifier,
they are different in the standard, which as I read it couldn't care
less about combining characters / case etc., it's purely a function
of Unicode point spelling.  So \u00aa and \u00Aa and \U000000aA are
identical simply because they represent the same Unicode point.

Neil.



More information about the cfe-dev mailing list