[llvm-branch-commits] [Hashing] Use a non-deterministic seed (PR #96282)

Fangrui Song via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Fri Jun 21 00:21:01 PDT 2024


MaskRay wrote:

> I like the idea, but I'm not sure this should be always enabled. This sounds like a lot of debugging fun when a seed-dependent case is not caught in testing.
> 
> Can we have this part of LLVM_REVERSE_ITERATION instead, which is used to detect similar hash iteration stability issues? We have a build bot for it.

I have fixed a lot of non-determinism bugs in the past few years.
In my experience, non-determinism caused by `DenseMap<A *, X>` is a more frequent issue than overuse of `DenseMap<StringRef, X>`.
`DenseMap<A *, X>` non-determinism issues could have a low failure rate with certain malloc implementations, especially when the bucket count is small.

Potential non-determinism concerns are outweighed by the benefits.
The argument in https://github.com/abseil/abseil-cpp/issues/339 might be useful:

> Part of Abseil's philosophy is engineering at scale. In addition to making it harder to execute a hash flooding attack, another reason we randomize iteration order is that is makes it easier for use to change the underlying implementation if we need to. When we wanted to change our hash function (for example), we found thousands of unit tests that depended on iteration order. We could not just break these tests and say "sorry, you are violating Hyrum's Law" since we would have thousands of angry Google developers. So instead, we fixed the tests and implemented randomization so that next time we wanted to change something in the implementation, this would not be an issue. The need for iteration order determinism is relatively rare compared to the potential wins we get by being free to change the implementation (maybe to make the hash table faster, which could save millions of cycles at scale, for example). By giving users a knob to disable the randomness, we'd be in the same situation all over again.

While -DLLVM_ENABLE_REVERSE_ITERATION=on helps, it adds complexity for bot maintainers and configurations.
This PR allows easier detection of `DenseMap<StringRef, X>` overuse by all contributors.
Eventually, perhaps we can extend this to all `DenseMap<Key, X>` for LLVM_ENABLE_ASSERTIONS=on.


https://github.com/llvm/llvm-project/pull/96282


More information about the llvm-branch-commits mailing list