[LLVMdev] a different hash for APInts

Owen Anderson resistor at mac.com
Wed Mar 11 15:56:47 PDT 2009


Other ones worth considering might be lookup3 and SuperFastHash.   You  
could also consider fixing our current hash function (which is an FNV- 
variant) to use the "official" FNV magic constants and see if they  
perform better.  What matters most is which one generates the fewest  
collisions on testcases we care about, with the lowest overhead.  I  
think some benchmarking is in order.

lookup3: http://www.burtleburtle.net/bob/c/lookup3.c
SuperFastHash: http://www.azillionmonkeys.com/qed/hash.html
FNV: http://isthe.com/chongo/tech/comp/fnv/

--Owen

On Mar 11, 2009, at 3:37 PM, Stuart Hastings wrote:

> I'm working on a bug where LLVM takes over six minutes to compile a  
> module.  After some hand-editing, the module looks like this:
>
> class half
> {
> private:
>  union uif
>  {
>    unsigned int i;
>    float f;
>  };
>  static const uif _toFloat[1 << 16];
> };
> const half::uif half::_toFloat[1 << 16] =
> {
>    {0x00000000}, {0x33800000}, {0x34000000}, {0x34400000},
>    {0x34800000}, {0x34a00000}, {0x34c00000}, {0x34e00000},
>    {0x35000000}, {0x35100000}, {0x35200000}, {0x35300000},
>    {0x35400000}, {0x35500000}, {0x35600000}, {0x35700000},
> ...
>    {0xfffd8000}, {0xfffda000}, {0xfffdc000}, {0xfffde000},
>    {0xfffe0000}, {0xfffe2000}, {0xfffe4000}, {0xfffe6000},
>    {0xfffe8000}, {0xfffea000}, {0xfffec000}, {0xfffee000},
>    {0xffff0000}, {0xffff2000}, {0xffff4000}, {0xffff6000},
>    {0xffff8000}, {0xffffa000}, {0xffffc000}, {0xffffe000},
> };
>
> The module has over 16K lines, so that's about 64K entries (1<<16).   
> There is no executable code in this testcase.
>
> The cause seems to be the APInt::getHashValue() function (near line  
> 626 of .../lib/Support/APInt.cpp).  Some investigation using the  
> debugger suggests that every value was hashing into about three  
> buckets, and the DenseMap code was looping excessively over the  
> extremely long chains in those three buckets.
>
> Since I'm not an expert on hashing, I picked some random hash  
> function I found on the Internet.  (Yes, that sounds like a good way  
> to get a disease.  :-)  The new hash function reduces the compile  
> time from over six minutes to under four seconds.  That's certainly  
> an improvement, but GCC takes less than one second to compile this  
> testcase.  (I haven't explored what GCC is doing, but I don't think  
> GCC supports arbitrary precision integers, and that probably  
> simplifies their internal hashing quite a bit.)
>
> I have NOT exhaustively tested this new hash function.  While the  
> improvement on this particular testcase is undeniable, I suspect  
> that for every hash function there is a pathological testcase that  
> behaves badly.  Ergo, I offer this hash to the list for comment: Is  
> this a good choice?  Is "hash ^= hash6432shift(pVal[i]);" a good  
> choice for hashing multi-word integers?  Is there a better hash I  
> should use instead? Should I be solving the problem in another way  
> entirely?
>
> <APInt.cpp.diffs.txt>
>
>
> Thanks in advance for your comments,
>
> stuart_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list