[cfe-dev] Wide strings and clang::StringLiteral.

Wed Dec 3 09:31:56 PST 2008

Paolo Bolzoni wrote:
> dear cfe-devs,
>
> I think we are worrying about less important details. Universal character
> names in identifiers are, of course, important. But I think it is much more
> urgent finding a way to manage wide string correctly.
>   
Universal character identifiers are also easy -- and provide an 
alternate way to represent UNICODE characters in wide strings.
> Personally I never seen identifiers with extended characters, but I can
> easily imagine L"non ascii string" in non-English programs.
>
> So what about focusing about a normalized way to memorize wide strings and
> thinking about extended characters in identifiers later?
>   
As I see it (speaking naively, I have no real experience here), the 
"normalized" way to memorize wide strings would be to abstract the 
encoding into a class that chooses "on the fly" between UTF-8, UTF-16, 
UCS-2(sp?), and UTF-32.  [Note that the choice of wchar_t depends on 
what character set one is supporting: it should be either 16+ bits (for 
UCS-2) or 32+ bits (for UTF-32).  This class should be able to be 
switched between conversion libraries at compile-time.  As UTF-8 and 
UTF-16 are multibyte encodings, they are strictly disallowed for wchar_t 
(and wide strings) but will be allowed for UNICODE strings if they make 
it into the next standard.]

----

Well...why not "just do it" and adjust the C/C++ identifier length 
detection (as C strings) to leave UCN's *unaltered*?

(Note that since I need to clone the test suite and iron out the MingW32 
bugs before actually doing serious work on constructing an LLVM backend 
for my vaporware C/C++/FORTRAN compiler, that this is somewhat of a 
"noise" suggestion -- but this is very simple, and is what I'm doing for 
preprocessing.)

Kenneth