<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60177>60177</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [libc++] Codecvts that decode UTF-8 can incorrectly return partial instead of error
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dimztimz
      </td>
    </tr>
</table>

<pre>
    All codecvts that decode UTF-8 to other Unicode encodings (UTF-32, UTF-16 and UCS-2) have a bug in one case where they return `partial` instead of `error`. The case is when there is incomplete code sequence at the end of the input range but also in the visible trailing bytes there is some error. Here is example that reproduces the bug.

```cpp
#include <locale>
#include <cassert>

using namespace std;

int main()
{
        char in[]                 =  "bш\uAAAA\U0010AAAA";
        const char32_t expected[] = U"bш\uAAAA\U0010AAAA";
        char32_t out[ sizeof(expected)/sizeof(expected[0]) ];
        
        auto& loc = locale::classic();
        auto& cvt = use_facet<std::codecvt<char32_t, char, mbstate_t>>(loc);
        auto state = mbstate_t{};
        const char* in_next;
        char32_t* out_next;

        // \uAAAA is located at bytes in[3..5]
        auto orig = in[4];
        in[4] = 'z'; // we malform the second byte of the 3-byte sequence of \uAAAA

        // this call does not see the last byte of \uAAAA, it can only see its
        // first 2 bytes. If those 2 bytes were correct it should return partial,
        // but we malformed the second (and visible) byte. It should return error.
        auto res = cvt.in (state, in, in + 5, in_next, out, out + 5, out_next);
        assert(res == cvt.error); //error is correct, partial is returned

        in[4] = orig;

        // \U0010AAAA is located at bytes in[6..9]
        orig = in[7];
        in[7] = 'z'; // we malform the second byte of the 4-byte sequence of \U0010AAAA

        // this call does not see the last byte of \U0010AAAA, it can only see its
        // first 3 bytes. If those 3 bytes were correct it should return partial,
        // but we malformed the second (and visible) byte. It should return error.
        res = cvt.in (state, in, in + 9, in_next, out, out + 5, out_next);
        assert(res == cvt.error); //error is correct, partial is returned

        in[7] = orig;

        // can be also reproduced if we malform in[8]
}
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVk2TmzgQ_TXypWsoLAyGgw_-iGtz3sx5SojGaEtIXklMMvPrt1qA7XGcraRyyZQLhNR63Xr9ukfCe3UyiBuW71h-WIghdNZtGtW_B9W_L2rbvG22WoO0DcrX4CF0IkCD9A3PX45PJQQLNnTo4NmoOI1G2kaZkwfGS7LJOOP7aL0sQJgGnvd_P3HGK-jEK4KAejiBMmANghQe4WuHDiF0-AYOw-AMsCI9CxeU0KxIQRkfUDRgW1pA56xjRZrAl24CUJ4wDEG4-KWMtP1ZY8B4FPD474BGIohARoAmotFQmfMQwAlzQqiHAEJ7S9HR2qvyqtYIwQmllTlB_RbQX9142yPEeBL4a5rDb4I8j8w5PDvbDHLcRAdPWHpg6XZ6Fun4k-fzNMMzZaQeGgSW7bWVQiPLPj1alMJ7dOG6Gp-DpziN6NGfhUTwoWHZ7tZCmQC9UIbxkvFqWllfTCrZCQfKjBKB-z-WHQAY5zU7LFlZsnw_bLfbLcv3z2m6TOOY8xuXlbTGByDUjL8EwG9nlAGbCZ_wnn8Nb0ayQ2D5Drx6R9syXl6QecX48fvpfJey_EA6pNcN4mUghmAZL0BbGQOb-d-ybCs1VY-cWLvZPW-SryFuGjy-tEJiYNk-sh93jwVFaZvCpxqhMb372gcR8CUmM_vEeKmtfOAGoln0ct2y3rH14THhjG9BmReD38IjBmnZDuF-fWaFHxk_wpwR0jbxEbChKhorIcokS5KcGP0QqXXqFAONJqs7xi-T0YTx9Tvja5btYHL6FaEXurWuj3XjUVrTRJ9z2WZP8etS2NQbZu08PEfolAcptIbGogdjA3iMXQe08OECfkHhe1ABpKBGpd-isQr-HrZVzgfgIx8JfKborMd5Br5SW5DWOZSB8HxnB93MfW5ucnx_j0ut6MoCNrc8MF5SW526EwmafCXw-R5-bE0fE-PQR9Lla0iUIawopHhcMz6B8R3k43gUB9_Hahtf1-WLeO6kOjYmXk6-Zndj5462U6LjDClrYohAJ05odjwGNncZ_ageUtr_qffSR36s4CJJqlsFfxTv-pF4178j3tVD8V4b3u8K-KZ1_oKIs-9EnP2BIv5Z_VZ_uH7XP6VfylyN46XkcploQLW3Kot45VXA68Pd9WLRbLKmyiqxwM2yWK94kVdZuug2NW_bdCVaXPIma8q0XEreljIvc96mouALteEpz9IlX6bLnC_TJJVyVbR5UVXLVZ2Llq1S7IXSidavfWLdaaG8H3BTpMv1eqFFjdrP1023IaOnejh5tkq18sFftwUVdLyYalVLxnf0yw-w__FNlLihi15kXr_dyfH21hjTtBic3nQhnD39S470nlTohjqRtmf8SJFMr6ezs__EdB7jaTzjx3ig_wIAAP__yW1jJg">