[libcxx-commits] [libcxx] [libc++] Insert new nodes at the beginning of equal range in std::unordered_multimap (PR #104702)

Sun Aug 18 16:23:36 PDT 2024

arvidjonasson wrote:

**Benchmark Results**
I'm running a Macbook Air M2 8GB. The benchmarks were running with charger plugged in and with battery saver turned off. The data caches are 128 KB L1 and 16 MB L2 (shared), 

Results before:
```
Running ./build/libcxx/test/benchmarks/unordered_multimap_operations.bench.out
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_multimap_uint32/1024                           62331 ns        62270 ns        11240
BM_InsertValue/unordered_multimap_uint32_sorted/1024                    47805 ns        47654 ns        14820
BM_InsertValue/unordered_multimap_string/1024                          270417 ns       270342 ns         2556
BM_InsertValue/unordered_multimap_prefixed_string/1024                 272385 ns       272335 ns         2623
BM_InsertValue/unordered_multimap_uint32_max_cardinality_512/1024       71090 ns        70982 ns         9978
BM_InsertValue/unordered_multimap_uint32_max_cardinality_128/1024       76065 ns        75991 ns         9182
BM_InsertValue/unordered_multimap_uint32_max_cardinality_32/1024        89788 ns        89516 ns         7699
BM_InsertValue/unordered_multimap_uint32_max_cardinality_8/1024        138602 ns       138576 ns         4701
BM_InsertValue/unordered_multimap_uint32_max_cardinality_1/1024        590632 ns       589968 ns         1202
BM_InsertValue/unordered_multimap_string_max_cardinality_512/1024      312396 ns       312361 ns         2244
BM_InsertValue/unordered_multimap_string_max_cardinality_128/1024      404650 ns       404602 ns         1734
BM_InsertValue/unordered_multimap_string_max_cardinality_32/1024       762362 ns       762264 ns          914
BM_InsertValue/unordered_multimap_string_max_cardinality_8/1024       2218712 ns      2218479 ns          313
BM_InsertValue/unordered_multimap_string_max_cardinality_1/1024      15270609 ns     15268222 ns           45
BM_EmplaceValue/unordered_multimap_uint32/1024                          58471 ns        58467 ns        11884
BM_EmplaceValue/unordered_multimap_uint32_sorted/1024                   45723 ns        45719 ns        15412
BM_EmplaceValue/unordered_multimap_string/1024                         261983 ns       261916 ns         2677
BM_EmplaceValue/unordered_multimap_prefixed_string/1024                262311 ns       262241 ns         2664
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_512/1024      68535 ns        68527 ns        10128
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_128/1024      73849 ns        73842 ns         9490
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_32/1024       87869 ns        87859 ns         7903
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_8/1024       137971 ns       137957 ns         5066
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_1/1024       581757 ns       581625 ns         1196
BM_EmplaceValue/unordered_multimap_string_max_cardinality_512/1024     310528 ns       310482 ns         2276
BM_EmplaceValue/unordered_multimap_string_max_cardinality_128/1024     405231 ns       405164 ns         1746
BM_EmplaceValue/unordered_multimap_string_max_cardinality_32/1024      765931 ns       765796 ns          913
BM_EmplaceValue/unordered_multimap_string_max_cardinality_8/1024      2229371 ns      2229125 ns          313
BM_EmplaceValue/unordered_multimap_string_max_cardinality_1/1024     15392159 ns     15390261 ns           46
```

Results after:
```
Running ./build/libcxx/test/benchmarks/unordered_multimap_operations.bench.out
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_multimap_uint32/1024                           58530 ns        58522 ns        12253
BM_InsertValue/unordered_multimap_uint32_sorted/1024                    46019 ns        46015 ns        15126
BM_InsertValue/unordered_multimap_string/1024                          262279 ns       262230 ns         2695
BM_InsertValue/unordered_multimap_prefixed_string/1024                 260191 ns       260157 ns         2688
BM_InsertValue/unordered_multimap_uint32_max_cardinality_512/1024       63376 ns        63363 ns        11320
BM_InsertValue/unordered_multimap_uint32_max_cardinality_128/1024       59583 ns        59577 ns        11942
BM_InsertValue/unordered_multimap_uint32_max_cardinality_32/1024        57061 ns        57052 ns        12315
BM_InsertValue/unordered_multimap_uint32_max_cardinality_8/1024         56225 ns        56217 ns        12407
BM_InsertValue/unordered_multimap_uint32_max_cardinality_1/1024         45790 ns        45779 ns        15304
BM_InsertValue/unordered_multimap_string_max_cardinality_512/1024      294407 ns       294365 ns         2402
BM_InsertValue/unordered_multimap_string_max_cardinality_128/1024      303496 ns       303460 ns         2346
BM_InsertValue/unordered_multimap_string_max_cardinality_32/1024       292438 ns       292404 ns         2395
BM_InsertValue/unordered_multimap_string_max_cardinality_8/1024        287342 ns       287327 ns         2423
BM_InsertValue/unordered_multimap_string_max_cardinality_1/1024        262533 ns       262502 ns         2648
BM_EmplaceValue/unordered_multimap_uint32/1024                          58538 ns        58532 ns        11972
BM_EmplaceValue/unordered_multimap_uint32_sorted/1024                   45809 ns        45804 ns        15193
BM_EmplaceValue/unordered_multimap_string/1024                         263236 ns       263203 ns         2691
BM_EmplaceValue/unordered_multimap_prefixed_string/1024                262738 ns       262697 ns         2674
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_512/1024      62269 ns        62261 ns        11204
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_128/1024      59043 ns        59036 ns        11972
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_32/1024       56576 ns        56563 ns        12206
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_8/1024        56446 ns        56443 ns        12290
BM_EmplaceValue/unordered_multimap_uint32_max_cardinality_1/1024        45799 ns        45795 ns        15219
BM_EmplaceValue/unordered_multimap_string_max_cardinality_512/1024     293243 ns       293192 ns         2394
BM_EmplaceValue/unordered_multimap_string_max_cardinality_128/1024     298157 ns       298116 ns         2354
BM_EmplaceValue/unordered_multimap_string_max_cardinality_32/1024      292008 ns       291967 ns         2360
BM_EmplaceValue/unordered_multimap_string_max_cardinality_8/1024       287732 ns       287706 ns         2442
BM_EmplaceValue/unordered_multimap_string_max_cardinality_1/1024       262941 ns       262902 ns         2655
```

For the previous insertion logic we observe really bad performance degradation when cardinality sinks, even at only 1024 elements in the map. The effect is extra prominent for string keys. The new insertion logic doesn't degrade when there are a lot of duplicate keys, instead the performance increases.

https://github.com/llvm/llvm-project/pull/104702