[libcxx-commits] [PATCH] D64818: [libc++] Implement missing filesystem::path constructors with locale

Tom Honermann via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Oct 26 19:22:04 PDT 2023


tahonermann added a comment.

I think this is looking good. Before I bless it though, I want to make sure the intent is clear.

The new locale sensitive constructors will use the `codecvt<wchar_t, char, ...>` facet from the specified locale to convert the provided `char`-based source (locale dependent encoding) to a `wchar_t`-based representation (UTF-16 or UTF-32 depending on platform). If on Windows, done. If on POSIX, the `wchar_t`-based representation is then converted to UTF-8 in `char`-based storage.

That seems like the right semantics; or at least the best that we can do given that filenames don't have strongly associated encodings. I haven't managed to validate for myself yet; is this consistent with the non-locale based constructors that take a `char`-based source? I presume those are assumed to be UTF-8 on POSIX regardless of whatever the current locale settings are.

If it isn't too difficult to do, it would be good to use a real locale (e.g., one that uses ISO-8859-1 encoding) and validate the conversion to UTF-8 (POSIX) or UTF-16 (Windows) in the test with some characters that require re-encoding. The existing test validates that all the characters are changed to `'o'`, but doesn't exercise a change to encoded length or that the characters didn't undergo a further conversion (exercising non-ASCII characters would help with that).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64818/new/

https://reviews.llvm.org/D64818



More information about the libcxx-commits mailing list