[libcxx-commits] [PATCH] D146398: [libcxx] Fix using std::wcout/wcin on Windows with streams configured in wide mode
Tom Honermann via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Apr 19 14:38:45 PDT 2023
tahonermann added a subscriber: ldionne.
tahonermann added a comment.
> Yeah, possibly - any suggestions on where?
I would have to defer to @ldionne and @Mordante for that.
> and if using a `wchar_t` based C++ stream, the narrow char stream gets converted between `wchar_t` and utf8 in narrow chars
Sorry, I wasn't able to parse that. I think I see what you mean though; the locale facet gets used to convert the `wchar_t` sequence to/from `char`.
> On Windows, you get the full unicode fidelity of the console by communicating with it in unicode mode, but actually writing wchars to it on Unix probably doesn't do any good...
I agree writing `wchar_t` values directly wouldn't be good but I would expect the implementation of `fputwc()` to do something reasonable. However...
Hmm, is it really standards conforming to not use the `std::codecvt<wchar_t, char, std::mbstate_t` locale facet for `std::wcout` and friends? We can make an argument that a program that contains a call to `_setmode()` doesn't require conformance, but in the absence of such a call, I would think that we are required to use the imbued facet. As is, with these changes, if a user were to imbue their own facet, it would get ignored. Do you know how the Microsoft or libstdcxx implementations handle that?
================
Comment at: libcxx/src/std_stream.h:128-149
+#if defined(_LIBCPP_WIN32API)
+ char_type __extbuf[__limit];
+#else
char __extbuf[__limit];
+#endif
int __nread = _VSTD::max(1, __encoding_);
for (int __i = 0; __i < __nread; ++__i)
----------------
mstorsjo wrote:
> tahonermann wrote:
> > I think it may be possible to do this in a cleaner way by factoring out the calls to `getc()` and `getwc()` doing something like this:
> > Add some helper methods:
> > static bool __do_getc(FILE* __fp, char* __pbuf) {
> > int __c = getc(__fp);
> > if (__c == EOF)
> > return false
> > *__pbuf = static_cast<char>(__c);
> > return true;
> > }
> > #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
> > static bool __do_getc(FILE* __fp, wchar_t *pbuf) {
> > wint_t __c = getwc(__file_);
> > if (__c == WEOF)
> > return false;
> > *__pbuf = static_cast<wchar_t>(__c);
> > return true;
> > }
> > #endif
> >
> > Change the above code to:
> >
> > char_type __extbuf[__limit];
> > int __nread = _VSTD::max(1, __encoding_);
> > for (int __i = 0; __i < __nread; ++__i)
> > {
> > if (!do_getc(__file_, &extbuf[__i]))
> > return traits_type::eof();
> > }
> >
> > Do similarly for `ungetc`/`ungetwc` and `fwrite`/`fputwc` further below.
> Thanks for the suggestion - that would probably help a bit! We'd still need the ifdef around the `char_type __extbuf[__limit];` though, as we need it as `char` for the non-Windows cases; the `codecvt` conversion in `__cv_->in` below only works if the source buffer is `char` here. But such a factorization can indeed clean up a few of the other cases of conditional code.
We could factor out the use of the `codecvt` facet in the same way. That would avoid the need to use `if constexpr` for deadcode elimination too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D146398/new/
https://reviews.llvm.org/D146398
More information about the libcxx-commits
mailing list