[all-commits] [llvm/llvm-project] 20341c: [libc++][format] Adds a UTF transcoder.

Mark de Wever via All-commits all-commits at lists.llvm.org
Tue Jul 11 11:28:35 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 20341c3ad6f64a2a61d0e38d0cdafd356a5b6cbb
      https://github.com/llvm/llvm-project/commit/20341c3ad6f64a2a61d0e38d0cdafd356a5b6cbb
  Author: Mark de Wever <koraq at xs4all.nl>
  Date:   2023-07-11 (Tue, 11 Jul 2023)

  Changed paths:
    M libcxx/include/CMakeLists.txt
    M libcxx/include/module.modulemap.in
    A libcxx/include/print
    M libcxx/modules/std/print.cppm
    A libcxx/test/libcxx/input.output/iostream.format/print.fun/transcoding.pass.cpp
    M libcxx/test/libcxx/transitive_includes/cxx03.csv
    M libcxx/test/libcxx/transitive_includes/cxx11.csv
    M libcxx/test/libcxx/transitive_includes/cxx14.csv
    M libcxx/test/libcxx/transitive_includes/cxx17.csv
    M libcxx/test/libcxx/transitive_includes/cxx20.csv
    M libcxx/test/libcxx/transitive_includes/cxx23.csv
    M libcxx/test/libcxx/transitive_includes/cxx26.csv
    M libcxx/utils/ci/run-buildbot

  Log Message:
  -----------
  [libc++][format] Adds a UTF transcoder.

This is a preparation for

  P2093R14 Formatted output

When the output of print is to the terminal it needs to use the native
API. This means transcoding UTF-8 to UTF-16 on Windows. The encoder's
interface is modeled after

 P2728 Unicode in the Library, Part 1: UTF Transcoding

But only the required part for P2093R14 is implemented.

On Windows wchar_t is 16 bits, in order to test on platforms where
wchar_t is 32 bits the transcoder has support for char16_t. It also adds
and UTF-8 to UTF-32 encoder which is useful for other tests.

Note it is possible to use <codecvt> for transcoding, but that header is
deprecated. So rather write new code that is not deprecated; the hard
part, decoding, has already been done. The <codecvt> header also
requires locale support while the new code works without including
<locale>.

Note the current transcoder implementation can be optimized since it
basically does UTF-8 -> UTF-32 -> UTF-16. The first goal is to have a
working implementation. Since it's not part of the ABI it's possible to
do the optimization later.

Depends on D149672

Reviewed By: ldionne, tahonermann, #libc

Differential Revision: https://reviews.llvm.org/D150031




More information about the All-commits mailing list