[LLVMdev] a different hash for APInts

Cédric Venet cedric.venet at laposte.net
Thu Mar 12 03:14:22 PDT 2009


Stuart Hastings a écrit :
>
> {
>     {0x00000000}, {0x33800000}, {0x34000000}, {0x34400000},
>     {0x34800000}, {0x34a00000}, {0x34c00000}, {0x34e00000},
>     {0x35000000}, {0x35100000}, {0x35200000}, {0x35300000},
>     {0x35400000}, {0x35500000}, {0x35600000}, {0x35700000},
> ...
>     {0xfffd8000}, {0xfffda000}, {0xfffdc000}, {0xfffde000},
>     {0xfffe0000}, {0xfffe2000}, {0xfffe4000}, {0xfffe6000},
>     {0xfffe8000}, {0xfffea000}, {0xfffec000}, {0xfffee000},
>     {0xffff0000}, {0xffff2000}, {0xffff4000}, {0xffff6000},
>     {0xffff8000}, {0xffffa000}, {0xffffc000}, {0xffffe000},
> };
>
> The module has over 16K lines, so that's about 64K entries (1<<16).  
> There is no executable code in this testcase.
>
> The cause seems to be the APInt::getHashValue() function (near line 
> 626 of .../lib/Support/APInt.cpp).  Some investigation using the 
> debugger suggests that every value was hashing into about three 
> buckets, and the DenseMap code was looping excessively over the 
> extremely long chains in those three buckets.
>

 From what I can see the old hash function can be good, if the number of 
bucket is not a power of two. The problem is that dense map use 64 
buckets initially and grow but doubling this number. So what happen here 
(I think, I only did a fast reading of the code) is the hash for a value 
i is about h=(i+32), and the bucket selection is done by h%64 or 
h%(1<<n) so only the low bits are taken into acount. on your exemple 
since all the low bits are zero, you have a lot of collision.
Instead of changing the hash, changing the number of bucket would be a 
better/simpler solution. From what I know of hashmap, having a number of 
bucket equal to a power of two is uterly stupid, ie: it mean that you 
only use the lower part of the hash. So why compute a hash on 32bits 
then... The only reason to do this is for faster modulo, but here it is 
not the case has the bucket number is a variable. Usually, prime number 
are used  as bucket number (AFAIK). But in your case, simply using an 
odd number would be a lot better.





More information about the llvm-dev mailing list