[cfe-dev] Handling of UCNs >256 in narrow character literals

Craig Topper craig.topper at gmail.com
Sat Jul 2 22:26:57 PDT 2011


UCN values >256 in narrow character literals are not checked at all to
see if they will fit in a character. They can be silently stored in a
char without warning about any loss of data. The UCN's lower byte will
be sign extended, corrupting the upper bits of the UCN value, but
again with no warning.

Also if you were to put multiple characters in a narrow character
literal with the second being a UCN, the earlier character will be
corrupted because the UCN value is ADDed to the original character
shifted 8 bits to the left without clipping to only a byte.

GCC seems to handle UCNs in narrow character literals by converting to
UTF-8 and treating them as a multiple character value.

Should clang match GCC here or at the very least clip the UCN and
issue a warning that the UCN was too large?

-- 
~Craig



More information about the cfe-dev mailing list