[cfe-dev] Wide strings and clang::StringLiteral.
Neil Booth
neil at daikokuya.co.uk
Tue Dec 2 06:48:26 PST 2008
Jean-Daniel Dupas wrote:-
> I didn't know that C99 supports UCN in identifier.
> I don't see a lot of informations about it in the C99 spec (except that
> UCN may appear in an identifier). Does this mean that this code is valid
> ?
>
> ---------- test.c -------
>
> int main (int argc, char **argv) {
> int h\u00e9 = 0; // hé
> return he\u0301; // hé - using decomposed form
> }
> --------------------------
>
> Actually, GCC does not support combining character (like COMBINING ACUTE
> ACCENT: 0x0301) :
>
> test.c:4:9: error: universal character \u0301 is not valid in an
> identifier
> test.c: In function ‘main’:
> test.c:4: error: ‘hé’ undeclared (first use in this function)
> test.c:4: error: (Each undeclared identifier is reported only once
> test.c:4: error: for each function it appears in.)
>
> Note that the error is correctly displayed anyway.
My front end gives
$ ~/src/c/cfe /tmp/test.c
"/tmp/test.c", line 3: error: universal character name "\u0301" cannot
be used in an identifier
return he\u0301; // hé - using decomposed form
--^^^^^^
1 error found compiling "/tmp/bug.c".
So you've chosen an invalid UCN. The standard lists the acceptable
UCNs; apparently this isn't one.
Just like "one" and "One" might be considered the same identifier,
they are different in the standard, which as I read it couldn't care
less about combining characters / case etc., it's purely a function
of Unicode point spelling. So \u00aa and \u00Aa and \U000000aA are
identical simply because they represent the same Unicode point.
Neil.
More information about the cfe-dev
mailing list