<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/89676>89676</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[libc++] wstring_convert::from_bytes Fails on Identity Conversions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
libc++
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
tommkelly
</td>
</tr>
</table>
<pre>
It appears that [`wstring_convert::from_bytes(const char*, const char*)`](https://en.cppreference.com/w/cpp/locale/wstring_convert/from_bytes) can fail erroneously--even throwing an exception--when specialized with the Identity Conversion and `Elem` type `char`, that is:
```
std::wstring_convert<std::codecvt<char, char, std::mbstate_t>, char>
```
This came up for me when writing cross-platform code meant to compile on both Windows and Linux. I needed to format file names as input for [`std::filesystem::file_size`](https://en.cppreference.com/w/cpp/filesystem/file_size), so I defined a `wstring_convert` in terms of `std::filesystem::path::value_type`, which should be `wchar_t` on Windows and `char` on Linux.
For the latter: one would expect `from_bytes` to return a `basic_string<char>` exactly equivalent to the input, but instead: the method threw a `range_error` (the expected behavior when `from_bytes` encounters an error and the user hasn't provided an error `wstring`).
I believe the culprit lies [here](https://github.com/llvm/llvm-project/blob/b8ff08d0e668e5397dd799b76ede0bd54fcba75c/libcxx/include/locale#L3225) in **llvm-project/libcxx/include/locale**:
```
__r = __cvtptr_->in(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);
__cvtcount_ += __frm_nxt - __frm;
if (__frm_nxt == __frm) {
__r = codecvt_base::error;
} else if (__r == codecvt_base::noconv) {
__ws.resize(__to - &__ws[0]);
//This only gets executed if _Elem is char
__ws.append((const _Elem*)__frm, (const _Elem*)__frm_end);
__frm = __frm_nxt;
__r = codecvt_base::ok;
} else if ...
```
>From the [documentation](https://en.cppreference.com/w/cpp/locale/codecvt/in) of `std::codecvt::in`:
> Leaves `from_next` and `to_next` pointing one beyond the last element successfully converted.
...
If this `codecvt` facet does not define a conversion, no characters are converted. `to_next` is set to be equal to `to`, `state` is unchanged, and [`std::codecvt_base::noconv`](https://en.cppreference.com/w/cpp/locale/codecvt_base) is returned.
This unfortunately doesn't specify the expected value of `from_next` after function execution in the `noconv` case, but one can infer by definition that it at least *may* behave similarly to `to_next`; i.e. `from_next` is set to `from` if `in` returns `std::codecvt_base::noconv`, and my own observations via debugger corroborate this.
In other words: it seems that this implementation of `from_bytes` is circumventing its own `noconv` case by first checking whether `__frm_next == __frm`
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykV11vqzwS_jXuzSgRNQ1JLnLRnjZSpXO50l4iY4bgPcZm7SEJ769fjSFp0va8q90jRQHsYT6fmceoGM3BIe7E6kWsXh_UQK0PO_Jd9wutHR8qX4-7dwLV96hCBGoVAQsX2SlSMO5Qau-OGEjkzyJ_boLvymokjEJutHeRQLcqCPks5A_4tLAVRSZWr0JuWqI-sga5F3KPbqn7PmCDAZ3GpfadkPuTkHvd90LurdfKIi99ckHub-1vQSsHjTIWMATv0A_RjosFHtEBtcGfjDuAcoBnjT0Z7xaLU4sOYo_aKGv-whpOhlqgFuG9RkeGRviRjEXjHShXgyiyN4udKDKgsUd-TgEWGUec8mVSaNmryJ55efqlx0j1lLcvyfxx3dK-Rn3kpSlxP-ByvYp0VSRFWJLI364C-du3Jv_RmghadQhDD40P0CGkqE_BECdEBx_joreKGh86YPPQoXIE5EH7rjcWwTuoPLXwT-Nqf4opEz-NG85LeAeHWGPN4qxBETT8ilMdRlARjOsHSqYnIF3DYLE4RsLu47mM5i_8P3Fyo296mJTJbUqeh3eosTEOa1DwDaCLDIwDwtBF8A38nae9ona6Oyo7YMlImBFwao1uIbZ-sDVUCR8nrk-ZDHh3l8IP8PDOlNC5iul_70MCo1VEGET-DN4hnJJuPPeoiVXcNAGj0kNAGoKboqxUNLqcQr1gKn9jQTwrTXYE_PdgjsriVHE2lyrGwVQDgXGRUHEi0l6H1Pqa-wlPk4Wg3AFLbrkUiJAblpvcQ85Bq47Ghwl1X9xFp_3gCENMrclaUmpYxxAxQKuiE3JN0Ad_NAy0q9xHEVPyt3e5e4cKrcEjJlV6sH0wBNZgZBy2GPA7jB0MtUM1Q8va4-Wy6IP_F2qeOZX1FV82TZNt6gyLYoOrfLuu6_V2W60LrDGr6tVToyu1XmlWYCp9Pgu5N07bocabmZb_zKVc8fAyDtKUfP5k7m9eZunfjZqyDCDyVyhLfaSeQrkQ-ZtxQm7KMqbilmUTuutNia7-eHDnWYT85fohQH7a34r85WJMHynVsQQhXya7sx5YzJYuwqaB5MVlX-Sv1xc4EWI9CwJcgphnYlmpiFPfTXC7qBTrV0AbES66w0Xr1zed54b_YugUlwGnebHhEGEBQha8LlYv0zj6iBdgQksart7ZEQ5IEfCMemDMmwZKZgng2csdd2uGuZVTublSZpKdKPJalN9uTnW48SWtwl3ObxzlXMBvkuh_fZvB5XL5LaT2wXepmcTqpfZ66NCRYh79I1K_8B0DnKvyafRe6ZAfjGNnrojP3-AnqiN39DxWHJ7TnJ2HKyN1Xum9cYnveIJWOPp5xlgVCdAiBwNx0BpjbAZrR5h5AeslTPaueXlvgLjyPL5n94oMGqWRoPYYwXmauQbUrIcPEFxX5xMilJ5GXsA7O3cumwgR01CukKe0snyfZGaySYlShLP04HTLwzj1aUrBPeH-thn-8Fx2p5dHWZwpCOu7kZzaZXCNDzQ4RWjHlK5pvKdTWDPCHXkkep0hcV_ghjBAMzjNAJw7j--YwttEu9fYQCe_Jjrj8vM50bgGA1TjVCeT3p1ObwSKwCLjQsjnTo1CPk8shhBNZ6wKdrxW4uKRyF_ALHH5xdOPKs47aTFFlOA8pyp-h_pvKzXXthvBnxz4KmI4pjaMcDQKaqyGwwEDaB-Cr3xQhAmu9-zowFOLAU4-1FxyjjsidvOZP-HbdP3UGEn9bRmu9M0DzgQ9dEec2stQTH59KQDnujEhfRCg_sWypxaTD6LI5tGFn_mgyB7qXV5v8616wN3j-jHfSClz-dDunrZNUWss9GZbPGb1Kl-vq6ao6sdNo7ZP68cHs5OZfMqepJSPT_lKLnWzyvNH1ahsI3WGjXjKsFPGLplylz4cHkyMA-4222JdPFhVoY3pU0lKZmFmNv5J_nQKu8TT1XCI4imzJlL8UEOGbPrIunlt9Qr_5RMK9spY5pPvPj7iwxDs7n8-rKSAopD7FNN_AgAA____YoV6">