[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode

Wed Apr 19 14:38:45 PDT 2023

tahonermann added a subscriber: ldionne.
tahonermann added a comment.

> Yeah, possibly - any suggestions on where?

I would have to defer to @ldionne and @Mordante for that.

> and if using a `wchar_t` based C++ stream, the narrow char stream gets converted between `wchar_t` and utf8 in narrow chars

Sorry, I wasn't able to parse that. I think I see what you mean though; the locale facet gets used to convert the `wchar_t` sequence to/from `char`.

> On Windows, you get the full unicode fidelity of the console by communicating with it in unicode mode, but actually writing wchars to it on Unix probably doesn't do any good...

I agree writing `wchar_t` values directly wouldn't be good but I would expect the implementation of `fputwc()` to do something reasonable. However...

Hmm, is it really standards conforming to not use the `std::codecvt<wchar_t, char, std::mbstate_t` locale facet for `std::wcout` and friends? We can make an argument that a program that contains a call to `_setmode()` doesn't require conformance, but in the absence of such a call, I would think that we are required to use the imbued facet. As is, with these changes, if a user were to imbue their own facet, it would get ignored. Do you know how the Microsoft or libstdcxx implementations handle that?

================
Comment at: libcxx/src/std_stream.h:128-149
+#if defined(_LIBCPP_WIN32API)
+    char_type __extbuf[__limit];
+#else
     char __extbuf[__limit];
+#endif
     int __nread = _VSTD::max(1, __encoding_);
     for (int __i = 0; __i < __nread; ++__i)
----------------
mstorsjo wrote:
> tahonermann wrote:
> > I think it may be possible to do this in a cleaner way by factoring out the calls to `getc()` and `getwc()` doing something like this:
> > Add some helper methods:
> >   static bool __do_getc(FILE* __fp, char* __pbuf) {
> >       int __c = getc(__fp);
> >       if (__c == EOF)
> >           return false
> >        *__pbuf = static_cast<char>(__c);
> >       return true;
> >   }
> >   #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
> >   static bool __do_getc(FILE* __fp, wchar_t *pbuf) {
> >       wint_t __c = getwc(__file_);
> >       if (__c == WEOF)
> >           return false;
> >       *__pbuf = static_cast<wchar_t>(__c);
> >       return true;
> >   }
> >   #endif
> > 
> > Change the above code to:
> > 
> >   char_type __extbuf[__limit];
> >   int __nread = _VSTD::max(1, __encoding_);
> >   for (int __i = 0; __i < __nread; ++__i)
> >   {
> >       if (!do_getc(__file_, &extbuf[__i]))
> >           return traits_type::eof();
> >   }
> > 
> > Do similarly for `ungetc`/`ungetwc` and `fwrite`/`fputwc` further below.
> Thanks for the suggestion - that would probably help a bit! We'd still need the ifdef around the `char_type __extbuf[__limit];` though, as we need it as `char` for the non-Windows cases; the `codecvt` conversion in `__cv_->in` below only works if the source buffer is `char` here. But such a factorization can indeed clean up a few of the other cases of conditional code.
We could factor out the use of the `codecvt` facet in the same way. That would avoid the need to use `if constexpr` for deadcode elimination too.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146398/new/

https://reviews.llvm.org/D146398