[cfe-dev] Almost there...
neil at daikokuya.co.uk
Sun Jun 7 01:18:41 PDT 2009
Eli Friedman wrote:-
> >> Reason I am thinking about this now is what does a d-char mean for a
> >> char32_t string? ?Assuming we can read a UTF32 formatted source file, those
> > You should be able to assume the basic character set is single byte;
> > both C and C++ require this. ?So no UTF32 source files.
> I don't see any connection between the basic character set and the
> encoding of the source file.
The source character set is generally understood to be the character
set the user interacts with their terminal, editor etc.
Each member of the basic character set is required to be represented
as a single byte in the source character set.
> > Something else to think about: how you track source locations if you
> > iconv the whole file upfront.
> Source locations ought to just point into the converted buffer, I
> think; we don't need to know the byte offsets in the original file.
If you're going to quote the source then you'll need to convert
back again - someone using an ISO-8859 terminal or Japanese terminal
won't want mangled UTF-8 diagnostics. Charset conversion is not
reversible in general, whether that's a practical issue is not
Apple's "interesting" decision to encode their headers in neither
ASCII nor UTF-8 will have implications too.
More information about the cfe-dev