[libcxx-commits] [PATCH] D64818: [libc++] Implement missing filesystem::path constructors with locale
Tom Honermann via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Thu Oct 26 19:22:04 PDT 2023
tahonermann added a comment.
I think this is looking good. Before I bless it though, I want to make sure the intent is clear.
The new locale sensitive constructors will use the `codecvt<wchar_t, char, ...>` facet from the specified locale to convert the provided `char`-based source (locale dependent encoding) to a `wchar_t`-based representation (UTF-16 or UTF-32 depending on platform). If on Windows, done. If on POSIX, the `wchar_t`-based representation is then converted to UTF-8 in `char`-based storage.
That seems like the right semantics; or at least the best that we can do given that filenames don't have strongly associated encodings. I haven't managed to validate for myself yet; is this consistent with the non-locale based constructors that take a `char`-based source? I presume those are assumed to be UTF-8 on POSIX regardless of whatever the current locale settings are.
If it isn't too difficult to do, it would be good to use a real locale (e.g., one that uses ISO-8859-1 encoding) and validate the conversion to UTF-8 (POSIX) or UTF-16 (Windows) in the test with some characters that require re-encoding. The existing test validates that all the characters are changed to `'o'`, but doesn't exercise a change to encoded length or that the characters didn't undergo a further conversion (exercising non-ASCII characters would help with that).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64818/new/
https://reviews.llvm.org/D64818
More information about the libcxx-commits
mailing list