[libcxx-commits] [PATCH] D64818: [libc++] Implement missing filesystem::path constructors with locale

Tom Honermann via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Nov 1 13:17:34 PDT 2023


tahonermann requested changes to this revision.
tahonermann added inline comments.
This revision now requires changes to proceed.


================
Comment at: libcxx/include/__locale:1566-1579
+    while (__nb < __ne && __r != codecvt_base::error) {
+        static const int __sz = 32;
+        _InternT __buf[__sz];
+        _InternT* __bn;
+        const char* __nn = __nb;
+        __r = __cvt.in(__mb, __nb, __ne - __nb > __sz ? __nb + __sz : __ne, __nn, __buf, __buf + __sz, __bn);
+        if (__r == codecvt_base::error || __nn == __nb)
----------------
This can end up in an infinite loop when the narrow input string ends in a partial code unit sequence. In that case, `partial` will be returned, but `from_next` (aka `__nn`) will be left pointing to the beginning of the partial sequence such that the next iteration will encounter the same partial result.

See https://godbolt.org/z/3q5xaG5dW for an example. The test there reliably exhibits an infinite loop for `std::codecvt<char32_t, char, std::mbstate_t>`. For the `std::codecvt<wchar_t, char, std::mbstate_t>` facet, the infinite loop is avoided because the converter packs partial state into `std::mbstate_t` and then (erroneously) returns `ok` instead of `partial` (erroneously because no character is ever translated, but no error is ever issued either).

These appear to be existing bug given that this code was just factored out from other locations.


================
Comment at: libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp:129
+  }
+
+  for (auto const& MS : PathList) {
----------------
I think additional tests should be added here to exercise various edge cases like the truncated partial code unit sequence in the godbolt link I added in another comment. We should also exercise some MxN sequence conversions (e.g., UTF-8 to UTF-16). That can be done by calling `.u16string()` on the resulting `path` object and comparing what it returns to expectations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64818/new/

https://reviews.llvm.org/D64818



More information about the libcxx-commits mailing list