[cfe-dev] Convert char to wchar_t?

Steve Ramsey clang at lucena.com
Sun May 27 12:06:36 PDT 2012

On May 26, 2012, at 4:07 PM, Aaron Wishnick wrote:
> I'm working on bug 11789: http://llvm.org/bugs/show_bug.cgi?id=11789
> As part of it, I need to take a C string for a PredefinedExpr, and emit the equivalent wchar_t string literal. Right now my code just casts chars to wchar_t, but I realize this isn't right, since it will break whenever clang is targeting a platform other than that which it was compiled on. My question is, is there an example of existing code for taking a string, encoded as char, and turning it into an array of bytes appropriately encoded as wchar_t for the target platform? If not, could anybody point me to a safe way to do this? Thanks!

The answer may be non-trivial. But since I am not familiar with clang internals and the problem requirements, first, 2 questions:

1) How is the original char string encoded? 
2) What is the target encoding of the wchar_t string?

Since we’re talking about __FUNCTION__, it’s conceivable that the characters involved will always be from a subset of the ASCII characters, in which case looping over the string and simply casting each char is a simple way to do it. However, since we’re talking about mangled names, we could be dealing with implementation details that complicate things. For example, if the original string is in a multi-byte encoding based on the user locale (unlikely), this is going to be an ugly problem which will involve either calling system libraries specific to each platform and much careful documenting, or just linking in ICU and wallowing in the horror. If the original string is always UTF-8 and the result string is always UTF-16 LE, then it’s possible to write an obtuse, though clean, conversion function (UTF-8 conversions are more complicated than you might think) but the resulting code will work everywhere. If we’re dealing with some Microsoft-specific encoding page that is not a subset of ASCII, but is not multi-byte, either, then the conversion will be a trivial loop over a lookup table or function.

So what’s the problem domain?


More information about the cfe-dev mailing list