[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode

Wed Apr 19 14:00:42 PDT 2023

mstorsjo added a comment.

In D146398#4281542 <https://reviews.llvm.org/D146398#4281542>, @tahonermann wrote:

>> Unicode mode is for wide print functions (for example, `wprintf`) and is not supported for narrow print functions. Use of a narrow print function on a Unicode mode stream triggers an assert.
>
> That makes it clear that the standard library must restrict itself to use of the wide print functions with streams with those file modes. But since we can't check the file mode, that means programmers that (incorrectly) use `std::cout` following a call to `_setmode()` to put the stream in Unicode mode will experience an error that appears to put the blame on the standard library. Perhaps it is worth documenting these interactions.

Yeah, possibly - any suggestions on where?

> Perhaps we should be using the wide printing functions for `wchar_t` streams on all platforms. `wchar_t` is seldom used on platforms other than Windows; do `std::wcin`, `std::wcout`, and `std::wcerr` actually work in a reasonable way on non-Windows platforms today?

I think the current behaviour is mostly reasonable on other platforms; currently, all actual IO towards stdin/stdout/stderr happens via narrow chars, and if using a `wchar_t` based C++ stream, the narrow char stream gets converted between `wchar_t` and utf8 in narrow chars. On Windows, you get the full unicode fidelity of the console by communicating with it in unicode mode, but actually writing wchars to it on Unix probably doesn't do any good...

================
Comment at: libcxx/src/std_stream.h:128-149
+#if defined(_LIBCPP_WIN32API)
+    char_type __extbuf[__limit];
+#else
     char __extbuf[__limit];
+#endif
     int __nread = _VSTD::max(1, __encoding_);
     for (int __i = 0; __i < __nread; ++__i)
----------------
tahonermann wrote:
> I think it may be possible to do this in a cleaner way by factoring out the calls to `getc()` and `getwc()` doing something like this:
> Add some helper methods:
>   static bool __do_getc(FILE* __fp, char* __pbuf) {
>       int __c = getc(__fp);
>       if (__c == EOF)
>           return false
>        *__pbuf = static_cast<char>(__c);
>       return true;
>   }
>   #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
>   static bool __do_getc(FILE* __fp, wchar_t *pbuf) {
>       wint_t __c = getwc(__file_);
>       if (__c == WEOF)
>           return false;
>       *__pbuf = static_cast<wchar_t>(__c);
>       return true;
>   }
>   #endif
> 
> Change the above code to:
> 
>   char_type __extbuf[__limit];
>   int __nread = _VSTD::max(1, __encoding_);
>   for (int __i = 0; __i < __nread; ++__i)
>   {
>       if (!do_getc(__file_, &extbuf[__i]))
>           return traits_type::eof();
>   }
> 
> Do similarly for `ungetc`/`ungetwc` and `fwrite`/`fputwc` further below.
Thanks for the suggestion - that would probably help a bit! We'd still need the ifdef around the `char_type __extbuf[__limit];` though, as we need it as `char` for the non-Windows cases; the `codecvt` conversion in `__cv_->in` below only works if the source buffer is `char` here. But such a factorization can indeed clean up a few of the other cases of conditional code.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146398/new/

https://reviews.llvm.org/D146398