[cfe-dev] [libc++] using std::wstring_convert and std::codecvt_utf16 to convert const char16_t* to wstring

Howard Hinnant hhinnant at apple.com
Fri Dec 16 19:09:49 PST 2011


I could've just as easily missed your question on StackOverflow (or been on vacation or whatever).  Don't hesitate to post here too.  Glad the problem got solved.  Sorry for the API you're having to deal with.  Wouldn't have been my first choice.

Howard

On Dec 16, 2011, at 10:07 PM, Ryan Ericson wrote:

> -Cfe-dev
> 
> I'm sorry for being impatient and asking it here again before I got
> the answer =). Now that I know you (and Cubbi) are on StackOverflow,
> I'll use that for questions like these. =)
> 
> Thanks!!!
> 
> On Fri, Dec 16, 2011 at 4:13 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>> On Dec 16, 2011, at 6:09 PM, Ryan Ericson wrote:
>> 
>>> Hi,
>>> 
>>> I'm learning about unicode support in C++ and I'm trying to convert
>>> const char16_t* (UTF-16) to wstring (UCS4). Reading the standard (and
>>> if I understand it correctly), it can be done through
>>> std::codecvt_utf16:
>>> 
>>> "For the facet codecvt_utf16:
>>> — The facet shall convert between UTF-16 multibyte sequences and UCS2
>>> or UCS4 (depending on thesize of Elem) within the program."
>>> 
>>> So I tried to use std::wstring_convert to do the conversion by doing
>>> the following:
>>> 
>>> #include <iostream>
>>> #include <locale>
>>> #include <codecvt>
>>> #include <string>
>>> 
>>> using namespace std;
>>> 
>>> int main()
>>> {
>>>     u16string s;
>>> 
>>>     s.push_back('h');
>>>     s.push_back('e');
>>>     s.push_back('l');
>>>     s.push_back('l');
>>>     s.push_back('o');
>>> 
>>>     wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
>>>     wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));
>>> 
>>>     wcout << ws << endl;
>>> 
>>>     return 0;
>>> }
>>> 
>>> Note: the explicit push_backs to get around the fact that my version
>>> of clang (Xcode 4.2) doesn't have unicode string literals.
>>> 
>>> When the code is run, I get terminate exception from from_bytes. Am I
>>> misunderstanding something and doing something illegal here? I was
>>> thinking it should work because the const char* that I passed to
>>> wstring_convert is UTF-16 encoded. I have also considered endianness
>>> being the issue, but I have checked that it's not the case.
>>> If that is indeed not going to work, what would be the best approach
>>> to convert UTF-16 to UCS4 using standard C++11?
>> 
>> Cubbi beat me to figuring this out by 58 minutes. ;-)
>> 
>> Howard
>> 





More information about the cfe-dev mailing list