[libcxx-commits] [PATCH] D150031: [libc++][format] Adds a UTF transcoder.

Mark de Wever via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Jun 8 09:09:02 PDT 2023


Mordante marked 3 inline comments as done.
Mordante added a comment.

Thanks for the review!



================
Comment at: libcxx/include/print:44-48
+  requires(sizeof(iter_value_t<_OutIt>) == 2) &&
+          (output_iterator<_OutIt, const char16_t&>
+#    ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
+           || output_iterator<_OutIt, const wchar_t&>
+#    endif
----------------
tahonermann wrote:
> The underlying type of `char16_t` is `uint_least16_t`, so the `sizeof` constraint is technically incorrect here. Similarly, a size of 2 for `wchar_t` doesn't imply UTF-16 if `CHAR_BIT` is 16 (but that is only the case for a few embedded platforms).
> 
> We'll need to see how P2728R0 eventually shakes out, but I think any inference based on character type size should be avoided in favor of an approach that states the desired encoding; e.g., `__encode_utf16(...)`. Constraints could then be added to ensure that the element size is at least 16-bit (`sizeof(T) * CHAR_BIT`).
Ah I totally forgot about `char16_t` not being required to be 16 bits. I see P2728 uses the same assumption, I'll drop Zach a note.

For now I just hard-code it for `char`, `charX_t` and use the _LIBCPP_SHORT_WCHAR define in libc++ for `wchar_t`. This macro is based on the platform, for Windows and AIX 32-bit and it's defined and for others not. Once P2728 lands I will switch to its definitions.


================
Comment at: libcxx/include/print:78
+_LIBCPP_HIDE_FROM_ABI constexpr _OutIt __transcode(_InIt __first, _InIt __last, _OutIt __out_it) {
+  // The __code_point_view has a basic_string_view interface.
+  // When transcoding becomes part of the standard we probably want to
----------------
tahonermann wrote:
> `basic_string_view` or `span`? I would expect more like the latter.
The `__code_point_view` was written for grapheme clustering in `<format>`. For string formatters I use a `basic_string_view` as format argument, so using a `basic_string_view` sounds natural. Since P2728R0 aims at non-character types as code units it would indeed make sense to use a `span`. Especially since `char_traits<int>` will be removed in LLVM 18. For now I keep it this way, once P2728R0 accepted I can change it. Then I probably want to model things closer to that paper.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150031/new/

https://reviews.llvm.org/D150031



More information about the libcxx-commits mailing list