[all-commits] [llvm/llvm-project] e3dea5: [libc++][format] Improves escaping performance. (#...
Mark de Wever via All-commits
all-commits at lists.llvm.org
Sun Apr 28 03:15:46 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e3dea5e3410fd6a1e549cfa7021c4f8652b36095
https://github.com/llvm/llvm-project/commit/e3dea5e3410fd6a1e549cfa7021c4f8652b36095
Author: Mark de Wever <koraq at xs4all.nl>
Date: 2024-04-28 (Sun, 28 Apr 2024)
Changed paths:
M libcxx/include/__format/escaped_output_table.h
M libcxx/include/format
A libcxx/test/libcxx/utilities/format/format.string/format.string.std/escaped_output.pass.cpp
M libcxx/utils/generate_escaped_output_table.py
Log Message:
-----------
[libc++][format] Improves escaping performance. (#88533)
The previous patch implemented
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting
of combining characters
These changes were correct, but had a size and performance penalty. This
patch improves the size and performance of the previous patch. The
performance is still worse than before since the lookups may require two
property lookups instead of one before implementing the paper. The
changes give a tighter coupling between the Unicode data and the
algorithm. Additional tests are added to notify about changes in future
Unicode updates.
Before
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 110704 ns 110696 ns 6206
BM_unicode_escaped<char> 101371 ns 101374 ns 6862
BM_cyrillic_escaped<char> 63329 ns 63327 ns 11013
BM_japanese_escaped<char> 41223 ns 41225 ns 16938
BM_emoji_escaped<char> 111022 ns 111021 ns 6304
BM_ascii_escaped<wchar_t> 112441 ns 112443 ns 6231
BM_unicode_escaped<wchar_t> 102776 ns 102779 ns 6813
BM_cyrillic_escaped<wchar_t> 58977 ns 58975 ns 11868
BM_japanese_escaped<wchar_t> 36885 ns 36886 ns 18975
BM_emoji_escaped<wchar_t> 115885 ns 115881 ns 6051
```
The first change is to manually encode the entire last area and make a
manual exception for the 240 excluded entries. This reduced the table
from 1077 to 729 entries and gave the following benchmark results.
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 104777 ns 104776 ns 6550
BM_unicode_escaped<char> 96980 ns 96982 ns 7238
BM_cyrillic_escaped<char> 60254 ns 60251 ns 11670
BM_japanese_escaped<char> 44452 ns 44452 ns 15734
BM_emoji_escaped<char> 104557 ns 104551 ns 6685
BM_ascii_escaped<wchar_t> 107456 ns 107454 ns 6505
BM_unicode_escaped<wchar_t> 96219 ns 96216 ns 7301
BM_cyrillic_escaped<wchar_t> 56921 ns 56904 ns 12288
BM_japanese_escaped<wchar_t> 39530 ns 39529 ns 17492
BM_emoji_escaped<wchar_t> 108494 ns 108496 ns 6408
```
An entry in the table can only contain 2048 code points. For larger
ranges there are multiple entries split in chunks with a maximum size of
2048 entries. To encode the entire Unicode code point range 21 bits are
required. The manual part starts at 0x323B0 this means all entries in
the table fit in 18 bits. This allows to allocate 3 additional bits for
the range. This allows entries to have 16384 elements. This range always
avoids splitting the range in multiple chunks.
This reduces the number of table elements from 729 to 711 and gives the
following benchmark results.
```
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char> 104289 ns 104289 ns 6619
BM_unicode_escaped<char> 96682 ns 96681 ns 7215
BM_cyrillic_escaped<char> 59673 ns 59673 ns 11732
BM_japanese_escaped<char> 41983 ns 41982 ns 16646
BM_emoji_escaped<char> 104119 ns 104120 ns 6683
BM_ascii_escaped<wchar_t> 104503 ns 104505 ns 6693
BM_unicode_escaped<wchar_t> 93426 ns 93423 ns 7489
BM_cyrillic_escaped<wchar_t> 54858 ns 54859 ns 12742
BM_japanese_escaped<wchar_t> 36385 ns 36384 ns 19259
BM_emoji_escaped<wchar_t> 105608 ns 105610 ns 6592
```
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list