[cfe-dev] [libc++] using std::wstring_convert and std::codecvt_utf16 to convert const char16_t* to wstring

Howard Hinnant hhinnant at apple.com
Fri Dec 16 16:13:41 PST 2011


On Dec 16, 2011, at 6:09 PM, Ryan Ericson wrote:

> Hi,
> 
> I'm learning about unicode support in C++ and I'm trying to convert
> const char16_t* (UTF-16) to wstring (UCS4). Reading the standard (and
> if I understand it correctly), it can be done through
> std::codecvt_utf16:
> 
> "For the facet codecvt_utf16:
> — The facet shall convert between UTF-16 multibyte sequences and UCS2
> or UCS4 (depending on thesize of Elem) within the program."
> 
> So I tried to use std::wstring_convert to do the conversion by doing
> the following:
> 
> #include <iostream>
> #include <locale>
> #include <codecvt>
> #include <string>
> 
> using namespace std;
> 
> int main()
> {
>     u16string s;
> 
>     s.push_back('h');
>     s.push_back('e');
>     s.push_back('l');
>     s.push_back('l');
>     s.push_back('o');
> 
>     wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
>     wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));
> 
>     wcout << ws << endl;
> 
>     return 0;
> }
> 
> Note: the explicit push_backs to get around the fact that my version
> of clang (Xcode 4.2) doesn't have unicode string literals.
> 
> When the code is run, I get terminate exception from from_bytes. Am I
> misunderstanding something and doing something illegal here? I was
> thinking it should work because the const char* that I passed to
> wstring_convert is UTF-16 encoded. I have also considered endianness
> being the issue, but I have checked that it's not the case.
> If that is indeed not going to work, what would be the best approach
> to convert UTF-16 to UCS4 using standard C++11?

Cubbi beat me to figuring this out by 58 minutes. ;-)

Howard





More information about the cfe-dev mailing list