[libcxx-commits] [PATCH] D150031: [libc++][format] Adds a UTF transcoder.
Mark de Wever via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Thu Jun 8 09:09:02 PDT 2023
Mordante marked 3 inline comments as done.
Mordante added a comment.
Thanks for the review!
================
Comment at: libcxx/include/print:44-48
+ requires(sizeof(iter_value_t<_OutIt>) == 2) &&
+ (output_iterator<_OutIt, const char16_t&>
+# ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
+ || output_iterator<_OutIt, const wchar_t&>
+# endif
----------------
tahonermann wrote:
> The underlying type of `char16_t` is `uint_least16_t`, so the `sizeof` constraint is technically incorrect here. Similarly, a size of 2 for `wchar_t` doesn't imply UTF-16 if `CHAR_BIT` is 16 (but that is only the case for a few embedded platforms).
>
> We'll need to see how P2728R0 eventually shakes out, but I think any inference based on character type size should be avoided in favor of an approach that states the desired encoding; e.g., `__encode_utf16(...)`. Constraints could then be added to ensure that the element size is at least 16-bit (`sizeof(T) * CHAR_BIT`).
Ah I totally forgot about `char16_t` not being required to be 16 bits. I see P2728 uses the same assumption, I'll drop Zach a note.
For now I just hard-code it for `char`, `charX_t` and use the _LIBCPP_SHORT_WCHAR define in libc++ for `wchar_t`. This macro is based on the platform, for Windows and AIX 32-bit and it's defined and for others not. Once P2728 lands I will switch to its definitions.
================
Comment at: libcxx/include/print:78
+_LIBCPP_HIDE_FROM_ABI constexpr _OutIt __transcode(_InIt __first, _InIt __last, _OutIt __out_it) {
+ // The __code_point_view has a basic_string_view interface.
+ // When transcoding becomes part of the standard we probably want to
----------------
tahonermann wrote:
> `basic_string_view` or `span`? I would expect more like the latter.
The `__code_point_view` was written for grapheme clustering in `<format>`. For string formatters I use a `basic_string_view` as format argument, so using a `basic_string_view` sounds natural. Since P2728R0 aims at non-character types as code units it would indeed make sense to use a `span`. Especially since `char_traits<int>` will be removed in LLVM 18. For now I keep it this way, once P2728R0 accepted I can change it. Then I probably want to model things closer to that paper.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D150031/new/
https://reviews.llvm.org/D150031
More information about the libcxx-commits
mailing list