[cfe-dev] Almost there...
Neil Booth
neil at daikokuya.co.uk
Sun Jun 7 01:18:41 PDT 2009
Eli Friedman wrote:-
> >> Reason I am thinking about this now is what does a d-char mean for a
> >> char32_t string? ?Assuming we can read a UTF32 formatted source file, those
> >
> > You should be able to assume the basic character set is single byte;
> > both C and C++ require this. ?So no UTF32 source files.
>
> I don't see any connection between the basic character set and the
> encoding of the source file.
The source character set is generally understood to be the character
set the user interacts with their terminal, editor etc.
http://www.dinkumware.com/manuals/?manual=compleat&page=charset.html
Each member of the basic character set is required to be represented
as a single byte in the source character set.
> > Something else to think about: how you track source locations if you
> > iconv the whole file upfront.
>
> Source locations ought to just point into the converted buffer, I
> think; we don't need to know the byte offsets in the original file.
If you're going to quote the source then you'll need to convert
back again - someone using an ISO-8859 terminal or Japanese terminal
won't want mangled UTF-8 diagnostics. Charset conversion is not
reversible in general, whether that's a practical issue is not
clear.
Apple's "interesting" decision to encode their headers in neither
ASCII nor UTF-8 will have implications too.
Neil.
More information about the cfe-dev
mailing list