[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode

Martin Storsjö via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Apr 13 23:19:56 PDT 2023


mstorsjo added a comment.

Hi @tahonermann, thanks for having a look!

In D146398#4266539 <https://reviews.llvm.org/D146398#4266539>, @tahonermann wrote:

> First, the code changes aren't actually sensitive to the mode of the file handle so the calls to `_setmode()` in the tests presumably have no behavioral impact (other than they may change the behavior of functions called by the changed code).

The "may change the behaviour of functions called by the changed code" is the core of the issue here. And if the `_setmode()` call is commented out in the tests, the test output does change (and the test fails).

Actually, as far as I know, there's no way to inspect the current mode set on a file descriptor - so even if we wanted to, we can't. (And other C++ libraries such as MS STL don't inspect the mode either.)

The core of the issue is that previously, when writing wchars, we converted them to a narrow charset (utf8, even if the right thing for Windows would be the current active codepage) and wrote them using `fwrite()`. Whenever writing with `fwrite()`, the data gets written as such, but when writing with `fputwc()`, each individual wchar gets passed through to the CRT without getting mangled, and the CRT converts it to the right output based on the file descriptor's current mode (which we can't observe outside of the CRT).

Thus, after this change, for wchar based stdout/stderr streams, we're always writing with `fputwc()`, which lets the CRT handle it according to mode. (For streams based on regular chars, we're still writing with `fwrite()` - which does break output when the streams are changed to wide mode. However both MS STL and GNU libstdc++ seem to fail to output things in that mode as well, so I would consider that combination pathological.)

> Second, but related to the first point above, the code changes are isolated to the `__stdinbuf` and `__stdoutbuf` classes which are only used to wrap `stdin`, `stdout`, and `stderr`; thus the changes won't impact other file handles for which `_setmode()` has been called. I don't think libc++ currently exposes the `FILE*` pointers held in `basic_filebuf` instances, but future changes may require doing so. See https://wg21.link/p1759 for example.

Right, that's a good point. But for the current implementation in e.g. `__stdoutbuf`, the writes get done at that level, so whatever extra logic we place in other classes won't get hit if `__stdoutbuf` does the writing on its own (unless we refactor it all significantly - which I'm not really wanting to take on at the moment).

Secondly, as a detail, that spec draft seems to expose the deepest native handle which seems to be meant to be a `HANDLE` on Windows - the suggested code for Windows uses `_fileno()` to get a file descriptor from a `FILE*`, and then uses `_get_osfhandle()` to get the underlying `HANDLE`. The modes set by `_setmode()` operate on the CRT level file descriptor - the value returned by `_fileno()` - not the underlying OS level `HANDLE`. So even with that `HANDLE` exposed from such a stream, I don't think you can access the CRT level file descriptor to change its mode. (With the Linux definition of `native_handle_type`, where it is the file descriptor returned by `fileno`, it would be possible to change the mode on Windows though.)

Thus, while that could be an issue in some future, requiring similar changes in other classes, I'd prefer to focus at the current state of affairs here.

> It probably is necessary to `#ifdef` the code to enable Windows-specific behavior. However, I don't think the `sizeof()` checks are what should be used for conditional behavior. Rather, I think the file handle mode should determine whether, e.g., `getwc()` or `getc()` is called.

Unfortunately - as mentioned above, the file descriptor mode isn't accessible - and I think it would make for some kinda clumsy code to be querying for it for every single write; neither MS STL nor GNU libstdc++ do that.

> If it really is necessary to conditionalize some of the code for `char_type`, then I think that should be done with `is_same_v<char_type, wchar_t>` rather than using `sizeof` as a proxy for the type.

Thanks, that would indeed be better; when I initially wrote this I wanted to do it like that, but my C++-fu isn't strong enough so I went with the simplest constant expression I could come up with, and forgot to attach a question about what the more correct conditional would be. I'll amend it to use that and update the patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146398/new/

https://reviews.llvm.org/D146398



More information about the libcxx-commits mailing list