[llvm-branch-commits] [libcxx] [libc++][format] Improves escaping performance. (PR #88533)

Louis Dionne via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Tue Apr 23 10:21:06 PDT 2024


================
@@ -305,23 +316,28 @@ def generate_data_tables() -> str:
 
     data = compactPropertyRanges(sorted(properties, key=lambda x: x.lower))
 
-    # The last entry is large. In Unicode 14 it contains the entries
-    # 3134B..0FFFF 912564 elements
-    # This are 446 entries of 1325 entries in the table.
-    # Based on the nature of these entries it is expected they remain for the
-    # forseeable future. Therefore we only store the lower bound of this section.
-    #
-    # When this region becomes substantially smaller we need to investigate
-    # this design.
-    #
-    # Due to P2713R1 Escaping improvements in std::format the range
+    # The output table has two large entries at the end, with a small "gap"
     #   E0100..E01EF  ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
-    # is no longer part of these entries. This causes an increase in the size
-    # of the table.
-    assert data[-1].upper == 0x10FFFF
-    # assert data[-1].upper - data[-1].lower > 900000
-
-    return "\n".join([generate_cpp_data(data[:-1], data[-1].lower)])
+    # Based on Unicode 15.1.0:
+    # - Encoding all these entries in the table requires 1173 entries.
+    # - Manually handling these last two blocks reduces the size to 729 entries.
+    # This not only reduces the binary size, but also improves the performance
+    # by having less elements to search.
+    # The exact entrires may differ between Unicode versions. When these numbers
----------------
ldionne wrote:

```suggestion
    # The exact entries may differ between Unicode versions. When these numbers
```

https://github.com/llvm/llvm-project/pull/88533


More information about the llvm-branch-commits mailing list