[libcxx-commits] [libcxx] [libc++][format] Switches to Unicode 15.1. (PR #86543)

Mark de Wever via libcxx-commits libcxx-commits at lists.llvm.org
Tue Mar 26 00:13:27 PDT 2024


================
@@ -53,6 +53,21 @@ static_assert(count_entries(cluster::__property::__LVT) == 10773);
 static_assert(count_entries(cluster::__property::__ZWJ) == 1);
 static_assert(count_entries(cluster::__property::__Extended_Pictographic) == 3537);
 
+namespace inCB = std::__indic_conjunct_break;
+constexpr int count_entries(inCB::__property property) {
+  return std::transform_reduce(
+      std::begin(inCB::__entries), std::end(inCB::__entries), 0, std::plus{}, [property](auto entry) {
+        if (static_cast<inCB::__property>(entry & 0b11) != property)
+          return 0;
+
+        return 1 + static_cast<int>((entry >> 2) & 0b1'1111'1111);
+      });
+}
+
+static_assert(count_entries(inCB::__property::__Linker) == 6);
+static_assert(count_entries(inCB::__property::__Consonant) == 240);
+static_assert(count_entries(inCB::__property::__Extend) == 884);
----------------
mordante wrote:

@ldionne These are the tests that we properly parse the new properties.

The data tables for this test are in "extended_grapheme_cluster.h". This file is generated from the Extended Grapheme Cluster break test provided by Unicode in the file "GraphemeBreakTest.txt". I use this test to validate whether the state machine is correct. This test has been updated in Unicode 15.1.0 to test the new GB9c rule.

https://github.com/llvm/llvm-project/pull/86543


More information about the libcxx-commits mailing list