[cfe-dev] Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?

Karsten Blees karsten.blees at dcon.de
Sun May 10 11:47:32 PDT 2015


Am 09.05.2015 um 05:17 schrieb 罗勇刚(Yonggang Luo) :

> Is that getting wchar_t to be 32bit on win32 a good idea

wchar_t must match what the compiler generates for L"string" literals. Thus you cannot "change" wchar_t in a library or compatibility layer, as it is a compiler property.

Furthermore, using 32-bit wchar_t on Windows would break binary compatibility with existing libraries. And programs that cannot use e.g. kernel32.dll are completely useless, they cannot do anything.

> One primary objective of code portability and posix-compatibility layer for win32 is to _remove_ the need for OS-specific code-paths. A wchar_t that is anything short (no pun intended) of a 32-bit integer will render it impossible to build out of the box many pieces of commonly-used software, including, but not limited to musl libc, the curses library, and anything that expects wchar_t to cover the entire unicode range.

Any software that uses wchar_t to represent Unicode is inherently platform specific / not portable.

For example: POSIX requires that wide characters can be processed in isolation, e.g. each wide character has a specific width (see wcwidth() API and format of character set description files). This doesn't fly with Unicode's combining characters. E.g. a triple of any two Unicode characters followed by tie/breve \u0361 has a width of two. A POSIX-compliant wchar_t would need distinct wide character codes for all such combinations (i.e. requiring at least 3 * 21 = 63 bits).

Therefore, libc implementations that use wchar_t for Unicode cannot be strictly POSIX compliant (independent on whether wchar_t is UTF-32, UTF-16 or UTF-8).

The Unicode specification, chapter 5.2, recommends using char16_t / char32_t for Unicode, not wchar_t.

Just my 2c
Karsten





More information about the cfe-dev mailing list