[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode
Martin Storsjö via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Sat May 13 14:38:05 PDT 2023
mstorsjo added inline comments.
================
Comment at: libcxx/docs/UsingLibcxx.rst:534-538
+If the mode of the standard streams ``stdout``, ``stderr`` or ``stdin`` is
+changed to Unicode mode with ``_setmode()``, all interaction with those streams
+must happen in Unicode mode. This means that ``std::cout``, ``std::cerr`` or
+``std::cin`` respectively can't be used at that point, only ``std::wcout``,
+``std::wcerr`` or ``std::wcin``.
----------------
tahonermann wrote:
> Microsoft's [[ https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170 | documentation for `_setmode()` ]] states that use of a narrow print function on a Unicode stream will result in an assertion failure (when linked with a debug C run-time library) and that is the case for `printf()` and `putc()`. However, it isn't the case for Microsoft's implementation of `std::cout` and friends; they misbehave (the narrow `char` buffer gets interpreted as holding a sequence of `wchar_t` which, of course, does nothing useful), but they don't assert. It seems they must bypass the C functions that assert; probably by calling `fwrite()` (which doesn't assert).
>
> I'm wondering if we should say more about what the ramifications are if the narrow streams are used in Unicode mode; saying "can't be used" doesn't communicate much. The suggested edit attempts to better explain what happens.
Thanks, the suggested text here seems fine!
================
Comment at: libcxx/docs/UsingLibcxx.rst:540-542
+The same goes if a custom locale is imbued on the wide char streams; at that
+point, all actual IO towards the stream is made with regular narrow chars,
+which doesn't work if the respective stream has been set to Unicode mode.
----------------
tahonermann wrote:
> I wonder if we should say something about `std::ios::sync_with_stdio()` here. Microsoft's implementation requires `sync_with_stdio(false)` for an imbued locale to have any effect. Is libc++ consistent with that requirement? If not, perhaps it should be?
Your suggested text here doesn't quite seem to match what we do.
On other platforms than Windows, then yes, libcxx does match the current or default locale by default. However, the core of this patch is exactly that on Windows, we _don't_ care about the current/default locale for the wide streams, we just pass things through by default (doing the writes as `wchar_t`s).
If we do imbue a locale, the writes end up done with `char`s instead, which doesn't behave as expected if the stream has been set in Unicode mode. So the combination of `imbue()` and `_setmode()` doesn't work, which was what I tried to say here before.
I took your suggestion but rewrote it to mean what I tried to say here, please have another look.
Then secondly, libcxx's `sync_with_stdio()` seems to be more or less a no-op so it isn't needed here right now at least. Not sure what we should say about it here...
================
Comment at: libcxx/src/std_stream.h:139
+#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
+static bool __do_ungetc(std::wint_t __c, FILE *__fp) {
+ if (ungetwc(__c, __fp) == WEOF)
----------------
tahonermann wrote:
> mstorsjo wrote:
> > I'm a bit unsure about whether this is ok to do; can we generally assume that `std::wint_t` and `int` are distinct different data types? If we'd just pass the raw `char`/`wchar_t` here, we can't access `traits_type::to_int_type` here. Or should we do that and directly use `char_traits<type>::to_int_type` explicitly instead of going via the `traits_type` typedef within the class?
> Good catch; I don't think we can depend on them being distinct types. It looks like there are cases where an "int_type" value is passed, so I think we should pass the right type. Here are a few other suggestions to choose from. Since this is pretty isolated code, I don't have strong opinions at all. Just make sure the intent is either obvious in the code or add a comment otherwise.
> - Pass a dummy argument solely for overload resolution.
> - Make these a template and pass the character type as a template parameter.
I'll amend the patch with an extra dummy argument here - that's probably the most straightforward fix.
================
Comment at: libcxx/src/std_stream.h:327
+static bool __do_fputc(char __c, FILE* __fp) {
+ if (fwrite(&__c, sizeof(__c), 1, __fp) != 1)
+ return false;
----------------
tahonermann wrote:
>
Hmm, I primarily use `fwrite` here since that's what was used before anyway - if we'd have used `fputc` for that before, I would have gone with that here.
While this is called when we write one char at a time, we also have a separate special case - if `__always_noconv_` is set, we `fwrite` a whole array of chars too.
While this might avoid triggering asserts in the CRT, it still mostly outputs gibberish in that case. So I'm not sure if I'd comment this saying that this is explicitly done this way to make that case "work" somehow, since it doesn't quite work anyway.
================
Comment at: libcxx/src/std_stream.h:396-398
+ // For wchar_t on Windows, don't do fwrite(), but write characters one
+ // at a time with fputwc; that works both when stdout is in the default
+ // mode and when it is set to unicode mode.
----------------
tahonermann wrote:
>
Thanks, amending the commit with this change.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D146398/new/
https://reviews.llvm.org/D146398
More information about the libcxx-commits
mailing list