[cfe-dev] Almost there...

Sun Jun 7 01:18:41 PDT 2009

Eli Friedman wrote:-

> >> Reason I am thinking about this now is what does a d-char mean for a
> >> char32_t string? ?Assuming we can read a UTF32 formatted source file, those
> >
> > You should be able to assume the basic character set is single byte;
> > both C and C++ require this. ?So no UTF32 source files.
> 
> I don't see any connection between the basic character set and the
> encoding of the source file.

The source character set is generally understood to be the character
set the user interacts with their terminal, editor etc.

http://www.dinkumware.com/manuals/?manual=compleat&page=charset.html

Each member of the basic character set is required to be represented
as a single byte in the source character set.

> > Something else to think about: how you track source locations if you
> > iconv the whole file upfront.
> 
> Source locations ought to just point into the converted buffer, I
> think; we don't need to know the byte offsets in the original file.

If you're going to quote the source then you'll need to convert
back again - someone using an ISO-8859 terminal or Japanese terminal
won't want mangled UTF-8 diagnostics.  Charset conversion is not
reversible in general, whether that's a practical issue is not
clear.

Apple's "interesting" decision to encode their headers in neither
ASCII nor UTF-8 will have implications too.

Neil.