[cfe-dev] Wide strings and clang::StringLiteral.
Kenneth Boyd
zaimoni at zaimoni.com
Wed Dec 3 09:31:56 PST 2008
Paolo Bolzoni wrote:
> dear cfe-devs,
>
> I think we are worrying about less important details. Universal character
> names in identifiers are, of course, important. But I think it is much more
> urgent finding a way to manage wide string correctly.
>
Universal character identifiers are also easy -- and provide an
alternate way to represent UNICODE characters in wide strings.
> Personally I never seen identifiers with extended characters, but I can
> easily imagine L"non ascii string" in non-English programs.
>
> So what about focusing about a normalized way to memorize wide strings and
> thinking about extended characters in identifiers later?
>
As I see it (speaking naively, I have no real experience here), the
"normalized" way to memorize wide strings would be to abstract the
encoding into a class that chooses "on the fly" between UTF-8, UTF-16,
UCS-2(sp?), and UTF-32. [Note that the choice of wchar_t depends on
what character set one is supporting: it should be either 16+ bits (for
UCS-2) or 32+ bits (for UTF-32). This class should be able to be
switched between conversion libraries at compile-time. As UTF-8 and
UTF-16 are multibyte encodings, they are strictly disallowed for wchar_t
(and wide strings) but will be allowed for UNICODE strings if they make
it into the next standard.]
----
Well...why not "just do it" and adjust the C/C++ identifier length
detection (as C strings) to leave UCN's *unaltered*?
(Note that since I need to clone the test suite and iron out the MingW32
bugs before actually doing serious work on constructing an LLVM backend
for my vaporware C/C++/FORTRAN compiler, that this is somewhat of a
"noise" suggestion -- but this is very simple, and is what I'm doing for
preprocessing.)
Kenneth
More information about the cfe-dev
mailing list