[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode
Martin Storsjö via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Apr 26 14:09:53 PDT 2023
mstorsjo added a comment.
Sorry for the late followup here, I had other things I had to prioritize...
In D146398#4285271 <https://reviews.llvm.org/D146398#4285271>, @tahonermann wrote:
>> It does something somewhat reasonable at least; `fopen(); fputwc()` ... on Linux writes the wchars as plain ASCII, anything outside of the ASCII range seems to produce literal ? chars.
>
> The `fputwc` man page claims conversion to the locale encoding; which defaults to "C" unless `setlocale()` has been called. So ASCII by default, but maybe not.
Oh indeed, after a `setlocale()` call, `fputwc()` does seem to work as expected, converting to the locale's charset (utf8) - the `b` flag to `fopen()` doesn't make any difference.
>> Can you show examples on how one could experimentally try out imbuing a nondefault locale for the wcout/wcerr/wcin streams, to experimentally see how MS STL behaves in that case?
>
> Here you go: https://godbolt.org/z/esP1535P4. Interesting, it looks like the Microsoft and libc++ implementations both call the `codecvt` members for each individual character while libstdcxx batches them. (I had to test MSVC locally; I look forward to someday being able to run MSVC generated code on godbolt.org again!)
Thanks! And with that example, if I add `_setmode(_fileno(stdout), _O_WTEXT);`, the output gets totally garbled. If I then remove the call to `imbue`, the output again works as expected.
So on Windows/MS STL - by default, both `cout` and `wcout` work fine, and I can imbue a locale with a custom conversion on `wcout` - the conversion is honored and works. If I set stdout to unicode mode, plain `cout` doesn't work any longer, and also a `wcout` with an imbued locale, which forces a conversion to `char`, doesn't work either.
I guess that should be a behaviour that we should be able to match.
>> So with that in mind, I guess we should try to keep the current codepaths that do conversions but either avoid hitting them or keeping the original wchar form around if noconv is returned... I guess I should sit down with the MS STL implementation and see if I can trace what codepaths it ends up using for e.g. wcout.
>
> I would expect that `noconv` will never be true when the internal and external character types differ since, at a minimum, a (potentially narrowing) copy has to be performed.
Right - I'll try to study this next.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D146398/new/
https://reviews.llvm.org/D146398
More information about the libcxx-commits
mailing list