[libcxx-commits] [libcxx] [libc++] Insert new nodes at the beginning of equal range in std::unordered_multimap (PR #104702)
Arvid Jonasson via libcxx-commits
libcxx-commits at lists.llvm.org
Sun Aug 18 16:23:36 PDT 2024
arvidjonasson wrote:
**Benchmark Results**
I'm running a Macbook Air M2 8GB. The benchmarks were running with charger plugged in and with battery saver turned off. The data caches are 128 KB L1 and 16 MB L2 (shared),
Results before:
```
Running ./build/libcxx/test/benchmarks/unordered_multimap_operations.bench.out
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_multimap_uint32/1024 62331 ns 62270 ns 11240
BM_InsertValue/unordered_multimap_uint32_sorted/1024 47805 ns 47654 ns 14820
BM_InsertValue/unordered_multimap_string/1024 270417 ns 270342 ns 2556
BM_InsertValue/unordered_multimap_prefixed_string/1024 272385 ns 272335 ns 2623
BM_InsertValue/unordered_multimap_uint32_max_cardinality_512/1024 71090 ns 70982 ns 9978
BM_InsertValue/unordered_multimap_uint32_max_cardinality_128/1024 76065 ns 75991 ns 9182
BM_InsertValue/unordered_multimap_uint32_max_cardinality_32/1024 89788 ns 89516 ns 7699
BM_InsertValue/unordered_multimap_uint32_max_cardinality_8/1024 138602 ns 138576 ns 4701
BM_InsertValue/unordered_multimap_uint32_max_cardinality_1/1024 590632 ns 589968 ns 1202
BM_InsertValue/unordered_multimap_string_max_cardinality_512/1024 312396 ns 312361 ns 2244
BM_InsertValue/unordered_multimap_string_max_cardinality_128/1024 404650 ns 404602 ns 1734
BM_InsertValue/unordered_multimap_string_max_cardinality_32/1024 762362 ns 762264 ns 914
BM_InsertValue/unordered_multimap_string_max_cardinality_8/1024 2218712 ns 2218479 ns 313
BM_InsertValue/unordered_multimap_string_max_cardinality_1/1024 15270609 ns 15268222 ns 45
BM_EmplaceValue/unordered_multimap_uint32/1024 58471 ns 58467 ns 11884
BM_EmplaceValue/unordered_multimap_uint32_sorted/1024 45723 ns 45719 ns 15412
BM_EmplaceValue/unordered_multimap_string/1024 261983 ns 261916 ns 2677
BM_EmplaceValue/unordered_multimap_prefixed_string/1024 262311 ns 262241 ns 2664
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_512/1024 68535 ns 68527 ns 10128
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_128/1024 73849 ns 73842 ns 9490
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_32/1024 87869 ns 87859 ns 7903
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_8/1024 137971 ns 137957 ns 5066
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_1/1024 581757 ns 581625 ns 1196
BM_EmplaceValue/unordered_multimap_string_max_cardinality_512/1024 310528 ns 310482 ns 2276
BM_EmplaceValue/unordered_multimap_string_max_cardinality_128/1024 405231 ns 405164 ns 1746
BM_EmplaceValue/unordered_multimap_string_max_cardinality_32/1024 765931 ns 765796 ns 913
BM_EmplaceValue/unordered_multimap_string_max_cardinality_8/1024 2229371 ns 2229125 ns 313
BM_EmplaceValue/unordered_multimap_string_max_cardinality_1/1024 15392159 ns 15390261 ns 46
```
Results after:
```
Running ./build/libcxx/test/benchmarks/unordered_multimap_operations.bench.out
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_multimap_uint32/1024 58530 ns 58522 ns 12253
BM_InsertValue/unordered_multimap_uint32_sorted/1024 46019 ns 46015 ns 15126
BM_InsertValue/unordered_multimap_string/1024 262279 ns 262230 ns 2695
BM_InsertValue/unordered_multimap_prefixed_string/1024 260191 ns 260157 ns 2688
BM_InsertValue/unordered_multimap_uint32_max_cardinality_512/1024 63376 ns 63363 ns 11320
BM_InsertValue/unordered_multimap_uint32_max_cardinality_128/1024 59583 ns 59577 ns 11942
BM_InsertValue/unordered_multimap_uint32_max_cardinality_32/1024 57061 ns 57052 ns 12315
BM_InsertValue/unordered_multimap_uint32_max_cardinality_8/1024 56225 ns 56217 ns 12407
BM_InsertValue/unordered_multimap_uint32_max_cardinality_1/1024 45790 ns 45779 ns 15304
BM_InsertValue/unordered_multimap_string_max_cardinality_512/1024 294407 ns 294365 ns 2402
BM_InsertValue/unordered_multimap_string_max_cardinality_128/1024 303496 ns 303460 ns 2346
BM_InsertValue/unordered_multimap_string_max_cardinality_32/1024 292438 ns 292404 ns 2395
BM_InsertValue/unordered_multimap_string_max_cardinality_8/1024 287342 ns 287327 ns 2423
BM_InsertValue/unordered_multimap_string_max_cardinality_1/1024 262533 ns 262502 ns 2648
BM_EmplaceValue/unordered_multimap_uint32/1024 58538 ns 58532 ns 11972
BM_EmplaceValue/unordered_multimap_uint32_sorted/1024 45809 ns 45804 ns 15193
BM_EmplaceValue/unordered_multimap_string/1024 263236 ns 263203 ns 2691
BM_EmplaceValue/unordered_multimap_prefixed_string/1024 262738 ns 262697 ns 2674
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_512/1024 62269 ns 62261 ns 11204
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_128/1024 59043 ns 59036 ns 11972
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_32/1024 56576 ns 56563 ns 12206
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_8/1024 56446 ns 56443 ns 12290
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_1/1024 45799 ns 45795 ns 15219
BM_EmplaceValue/unordered_multimap_string_max_cardinality_512/1024 293243 ns 293192 ns 2394
BM_EmplaceValue/unordered_multimap_string_max_cardinality_128/1024 298157 ns 298116 ns 2354
BM_EmplaceValue/unordered_multimap_string_max_cardinality_32/1024 292008 ns 291967 ns 2360
BM_EmplaceValue/unordered_multimap_string_max_cardinality_8/1024 287732 ns 287706 ns 2442
BM_EmplaceValue/unordered_multimap_string_max_cardinality_1/1024 262941 ns 262902 ns 2655
```
For the previous insertion logic we observe really bad performance degradation when cardinality sinks, even at only 1024 elements in the map. The effect is extra prominent for string keys. The new insertion logic doesn't degrade when there are a lot of duplicate keys, instead the performance increases.
https://github.com/llvm/llvm-project/pull/104702
More information about the libcxx-commits
mailing list