[libcxx-commits] [PATCH] D140779: Use hash value checks optimizations consistently

Dmitry Ilvokhin via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sun May 21 05:38:56 PDT 2023


ilvokhin added a comment.

Thanks, for looking into it.

There are at least two different cases this change might affect: cheap to compare keys and expensive to compare keys.

For cheap to compare keys the intent is avoid `std::__constrain_hash` call, with modulo operation inside (which might dominate comparison for some cases as comparison is cheap) by comparing unconstrained hash first.

For expensive to compare keys we are optimizing a little bit different case, when unconstrained hashes do not match, but constrained hashed matched. For this case if hashes do not match, we can skip expensive `key_eq` call.

Unfortunately, I believe, both cases are not covered by the current benchmark set.

Let's look at the expensive to compare keys case deeper, as it's easier to analyze (at least for me) and I didn't come up with a good example for cheap to compare keys yet. One example of expensive to compare keys are strings, but not every string set is suitable. Currently, we generate random strings with length 1024 and characters are generated with uniform distribution (from numbers, lower and upper case letters). It's very likely to compare such strings we will perform small amount of operations as mismatch will be found on the relatively short prefix at the beginning of the string.

We can generate strings with long common prefix to make them more expensive to compare. Below is example of such benchmark for strings with common prefix of length 992.

  Benchmark                                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
  ------------------------------------------------------------------------------------------------------------------------------------------------------
  BM_InsertValue/unordered_set_prefixed_string/1024                      -0.0148         -0.0148        309669        305095        308578        304001
  BM_InsertValueRehash/unordered_set_prefixed_string/1024                -0.1288         -0.1274        354849        309146        352840        307893
  BM_InsertDuplicate/unordered_set_prefixed_string/1024                  -0.0281         -0.0288        191892        186503        191225        185719
  BM_EmplaceDuplicate/unordered_set_prefixed_string/1024                 -0.0543         -0.0540        196555        185887        195738        185160

We would like to have more elements in a single bucket to make this case even worse, for this we can increase load factor. Below is benchmark for strings with common prefix of length 992 and hash set load factor 3.

  Benchmark                                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
  ------------------------------------------------------------------------------------------------------------------------------------------------------
  BM_InsertValue/unordered_set_prefixed_string/1024                      -0.1401         -0.1399        377803        324867        376322        323659
  BM_InsertValueRehash/unordered_set_prefixed_string/1024                -0.0306         -0.0313        392866        380829        391327        379085
  BM_InsertDuplicate/unordered_set_prefixed_string/1024                  -0.1396         -0.1395        238364        205093        237427        204315
  BM_EmplaceDuplicate/unordered_set_prefixed_string/1024                 -0.1548         -0.1546        237479        200714        236605        200020

Overall, with this change we can get +- performance neutral results for average use cases and from 1% to 15% improvement for one of the corner cases. Moreover, implementation of `insert` and `emplace` methods will be more consistent with `find` method.

I agree, it's not a clearest cut in the world, but seems like a small improvement for me. Let me know what do you think.

Also, I discovered benchmarks from the description was taken on shared CPU, updated description with results on dedicated CPU to reduce noise.F27563462: D140779_benchmarks.diff <https://reviews.llvm.org/F27563462>


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140779/new/

https://reviews.llvm.org/D140779



More information about the libcxx-commits mailing list