[libcxx-commits] [PATCH] D140779: Use hash value checks optimizations consistently
Dmitry Ilvokhin via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Sun May 21 05:38:56 PDT 2023
ilvokhin added a comment.
Thanks, for looking into it.
There are at least two different cases this change might affect: cheap to compare keys and expensive to compare keys.
For cheap to compare keys the intent is avoid `std::__constrain_hash` call, with modulo operation inside (which might dominate comparison for some cases as comparison is cheap) by comparing unconstrained hash first.
For expensive to compare keys we are optimizing a little bit different case, when unconstrained hashes do not match, but constrained hashed matched. For this case if hashes do not match, we can skip expensive `key_eq` call.
Unfortunately, I believe, both cases are not covered by the current benchmark set.
Let's look at the expensive to compare keys case deeper, as it's easier to analyze (at least for me) and I didn't come up with a good example for cheap to compare keys yet. One example of expensive to compare keys are strings, but not every string set is suitable. Currently, we generate random strings with length 1024 and characters are generated with uniform distribution (from numbers, lower and upper case letters). It's very likely to compare such strings we will perform small amount of operations as mismatch will be found on the relatively short prefix at the beginning of the string.
We can generate strings with long common prefix to make them more expensive to compare. Below is example of such benchmark for strings with common prefix of length 992.
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_set_prefixed_string/1024 -0.0148 -0.0148 309669 305095 308578 304001
BM_InsertValueRehash/unordered_set_prefixed_string/1024 -0.1288 -0.1274 354849 309146 352840 307893
BM_InsertDuplicate/unordered_set_prefixed_string/1024 -0.0281 -0.0288 191892 186503 191225 185719
BM_EmplaceDuplicate/unordered_set_prefixed_string/1024 -0.0543 -0.0540 196555 185887 195738 185160
We would like to have more elements in a single bucket to make this case even worse, for this we can increase load factor. Below is benchmark for strings with common prefix of length 992 and hash set load factor 3.
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_set_prefixed_string/1024 -0.1401 -0.1399 377803 324867 376322 323659
BM_InsertValueRehash/unordered_set_prefixed_string/1024 -0.0306 -0.0313 392866 380829 391327 379085
BM_InsertDuplicate/unordered_set_prefixed_string/1024 -0.1396 -0.1395 238364 205093 237427 204315
BM_EmplaceDuplicate/unordered_set_prefixed_string/1024 -0.1548 -0.1546 237479 200714 236605 200020
Overall, with this change we can get +- performance neutral results for average use cases and from 1% to 15% improvement for one of the corner cases. Moreover, implementation of `insert` and `emplace` methods will be more consistent with `find` method.
I agree, it's not a clearest cut in the world, but seems like a small improvement for me. Let me know what do you think.
Also, I discovered benchmarks from the description was taken on shared CPU, updated description with results on dedicated CPU to reduce noise.F27563462: D140779_benchmarks.diff <https://reviews.llvm.org/F27563462>
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140779/new/
https://reviews.llvm.org/D140779
More information about the libcxx-commits
mailing list