[libcxx-commits] [PATCH] D126971: [libc++] Implements Unicode grapheme clustering

Mark de Wever via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sun Jul 3 10:39:33 PDT 2022


Mordante created this revision.
Herald added subscribers: arichardson, mgorny.
Herald added a project: All.
Mordante updated this revision to Diff 434071.
Mordante added a comment.
Mordante updated this revision to Diff 434254.
Mordante updated this revision to Diff 434255.
Mordante updated this revision to Diff 434256.
Mordante updated this revision to Diff 434273.
Mordante updated this revision to Diff 434327.
Mordante updated this revision to Diff 434427.
Mordante updated this revision to Diff 441926.
Mordante updated this revision to Diff 441945.
Mordante updated this revision to Diff 441947.
Mordante updated this revision to Diff 441966.
Mordante added a subscriber: STL_MSFT.
Mordante published this revision for review.
Mordante added reviewers: ldionne, vitaut.
Herald added a project: libc++.
Herald added a subscriber: libcxx-commits.
Herald added a reviewer: libc++.

Fixes some CI issue.
I also noticed the there is room to compact the data even more. In come cases
Unicode splits one larger range is several subranges to add additional
comments. By merging these ranges the number of entries in the tables is
reduced.


Mordante added a comment.

Try to fix the CI.


Mordante added a comment.

Attempts to fix the CI.


Mordante added a comment.

Fixes a shadow warning.


Mordante added a comment.

Fixes ASAN errors.


Mordante added a comment.

Improves end of input handling. That should fix the CI errors.


Mordante added a comment.

Use a better way to store the data.
This improves the speed by 7% from the last version, and 4% from the original 6 byte algorithm.

Note the code and comment needs more polish.


Mordante added a comment.

Improve and polish the code. Replace libc++ specific tests with generic tests.


Mordante added a comment.

Fixing GCC build failures.


Mordante added a comment.

Disable a test in GCC since it times out.


Mordante added a comment.

Minor cleanups and addresses @STL_MSFT's review comments while upstreaming some of these changes.



================
Comment at: libcxx/include/__format/parser_std_format_spec.h:1195
   const _CharT* __pos =
       __detail::__estimate_column_width_fast(__first, __limit);
 
----------------
D125606 "Not specific to this diff but I think it would be cleaner to fold Unicode/non-Unicode handling into __estimate_column_width and have a single check and copy here."


Note this is a proof-of-concept and uploaded to test the CI. Specifically
to validate whether the code where sizeof(wchar_t) == sizeof(uin16_t)
works as expected.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D126971

Files:
  libcxx/.clang-format
  libcxx/benchmarks/std_format_spec_string_unicode.bench.cpp
  libcxx/include/CMakeLists.txt
  libcxx/include/__format/extended_grapheme_cluster_table.h
  libcxx/include/__format/formatter_integral.h
  libcxx/include/__format/formatter_output.h
  libcxx/include/__format/formatter_string.h
  libcxx/include/__format/parser_std_format_spec.h
  libcxx/include/__format/unicode.h
  libcxx/include/format
  libcxx/include/module.modulemap.in
  libcxx/test/libcxx/private_headers.verify.cpp
  libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.h
  libcxx/test/libcxx/utilities/format/format.string/format.string.std/extended_grapheme_cluster.pass.cpp
  libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_string_non_unicode.pass.cpp
  libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_string_unicode.pass.cpp
  libcxx/test/std/utilities/format/format.functions/ascii.pass.cpp
  libcxx/test/std/utilities/format/format.functions/format_tests.h
  libcxx/test/std/utilities/format/format.functions/unicode.pass.cpp
  libcxx/utils/generate_extended_grapheme_cluster_table.py
  libcxx/utils/generate_extended_grapheme_cluster_test.py

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D126971.441966.patch
Type: text/x-patch
Size: 287576 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libcxx-commits/attachments/20220703/4fdfc8bc/attachment-0001.bin>


More information about the libcxx-commits mailing list