[llvm-dev] Default hashing function for integers (DenseMapInfo.h)

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 9 23:56:03 PST 2017


Hi François,

> On Jan 9, 2017, at 11:46 PM, Francois Fayard via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> 
>> On Jan 10, 2017, at 2:31 AM, Chris Lattner <sabre at nondot.org <mailto:sabre at nondot.org>> wrote:
>> 
>> As others have pointed out, 37 does have some nice properties (by being prime), but we’ve also had it from the very early days.  I wouldn’t say that it has been extremely well considered at all.
>> 
>> -Chris
> 
> Thanks for your reply. But I am not sure to understand the last sentence. Does it mean that this hash function has been criticized before? Do you also think that it can be improved?
> 
> I am sorry to insist on that, but I don’t see any advantage of using hash(k) = c * k with c being a primer number. What is important is choosing c such that gcd(c, m) = 1 for any size m of the hash table. This is the case for c = 1. As a consequence, I don’t see any benefit in using c = 37 over c = 1. Moreover, all those hash functions are badly considered for a reason : they behave very poorly when the keys are of the form d * k0 + k1.
> 
> I really think that this hash function could be improved. But to know if it's worth it, I would like to know if having a better hash for integer types would benefit LLVM.

Some tests I can suggest is to replace the hash function with your favorite and:

1) Test the impact on the performance (is an LTO link of clang faster? Is `clang -O0` getting faster?), because ultimately I believe real-world impact is the best way to get a change in.
2) Instrument the DenseMap accessors with counters and measure impact on the number of collisions when changing the hash function.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170109/0b3f3c9e/attachment.html>


More information about the llvm-dev mailing list