[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode

Martin Storsjö via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Apr 20 01:34:08 PDT 2023


mstorsjo added a comment.

In D146398#4281670 <https://reviews.llvm.org/D146398#4281670>, @tahonermann wrote:

>> and if using a `wchar_t` based C++ stream, the narrow char stream gets converted between `wchar_t` and utf8 in narrow chars
>
> Sorry, I wasn't able to parse that. I think I see what you mean though; the locale facet gets used to convert the `wchar_t` sequence to/from `char`.

Sorry for the unclear writing - yes, that's what I meant.

>> On Windows, you get the full unicode fidelity of the console by communicating with it in unicode mode, but actually writing wchars to it on Unix probably doesn't do any good...
>
> I agree writing `wchar_t` values directly wouldn't be good but I would expect the implementation of `fputwc()` to do something reasonable. However...

It does something somewhat reasonable at least; `fopen(); fputwc() ...` on Linux writes the wchars as plain ASCII, anything outside of the ASCII range seems to produce literal `?` chars. On Windows, the same gets the wchars converted to the system's native codepage. If I fopen the file in binary mode, I get the wchars written as such, without any conversion on Windows. On Linux, opening the file in binary mode doesn't make any difference.

> Hmm, is it really standards conforming to not use the `std::codecvt<wchar_t, char, std::mbstate_t` locale facet for `std::wcout` and friends? We can make an argument that a program that contains a call to `_setmode()` doesn't require conformance, but in the absence of such a call, I would think that we are required to use the imbued facet. As is, with these changes, if a user were to imbue their own facet, it would get ignored. Do you know how the Microsoft or libstdcxx implementations handle that?

Hmm, right - I briefly thought about that. I was pointed to e.g. https://github.com/microsoft/STL/blob/daa994bfc41c36196c536f2b68388f859d6bd656/stl/inc/fstream#L609-L634. For regular chars, there seems to be a fastpath if there's no `_Pcvt`, otherwise it calls `_Mysb::xsputn`, and for wchars it always does the latter. I think https://github.com/microsoft/STL/blob/daa994bfc41c36196c536f2b68388f859d6bd656/stl/inc/fstream#L416-L467 is what might be handling the outputting... It seems like if the `_Pcvt` is needed and signals that it did some conversion, it does convert to char and write that out, otherwise it keeps it in the native form and fputc/fputwcs it.

On the other hand, if it ever does that, and the stream is set to unicode mode, it would end up trying to `fwrite` narrow char data to a unicode stream, which doesn't really work. So perhaps that case is simply not supported?

Can you show examples on how one could experimentally try out imbuing a nondefault locale for the wcout/wcerr/wcin streams, to experimentally see how MS STL behaves in that case?

So with that in mind, I guess we should try to keep the current codepaths that do conversions but either avoid hitting them or keeping the original wchar form around if `noconv` is returned... I guess I should sit down with the MS STL implementation and see if I can trace what codepaths it ends up using for e.g. wcout.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146398/new/

https://reviews.llvm.org/D146398



More information about the libcxx-commits mailing list