[LLVMdev] a different hash for APInts

Stuart Hastings stuart at apple.com
Wed Mar 11 15:37:05 PDT 2009


I'm working on a bug where LLVM takes over six minutes to compile a  
module.  After some hand-editing, the module looks like this:

class half
{
private:
   union uif
   {
     unsigned int i;
     float f;
   };
   static const uif _toFloat[1 << 16];
};
const half::uif half::_toFloat[1 << 16] =
{
     {0x00000000}, {0x33800000}, {0x34000000}, {0x34400000},
     {0x34800000}, {0x34a00000}, {0x34c00000}, {0x34e00000},
     {0x35000000}, {0x35100000}, {0x35200000}, {0x35300000},
     {0x35400000}, {0x35500000}, {0x35600000}, {0x35700000},
...
     {0xfffd8000}, {0xfffda000}, {0xfffdc000}, {0xfffde000},
     {0xfffe0000}, {0xfffe2000}, {0xfffe4000}, {0xfffe6000},
     {0xfffe8000}, {0xfffea000}, {0xfffec000}, {0xfffee000},
     {0xffff0000}, {0xffff2000}, {0xffff4000}, {0xffff6000},
     {0xffff8000}, {0xffffa000}, {0xffffc000}, {0xffffe000},
};

The module has over 16K lines, so that's about 64K entries (1<<16).   
There is no executable code in this testcase.

The cause seems to be the APInt::getHashValue() function (near line  
626 of .../lib/Support/APInt.cpp).  Some investigation using the  
debugger suggests that every value was hashing into about three  
buckets, and the DenseMap code was looping excessively over the  
extremely long chains in those three buckets.

Since I'm not an expert on hashing, I picked some random hash function  
I found on the Internet.  (Yes, that sounds like a good way to get a  
disease.  :-)  The new hash function reduces the compile time from  
over six minutes to under four seconds.  That's certainly an  
improvement, but GCC takes less than one second to compile this  
testcase.  (I haven't explored what GCC is doing, but I don't think  
GCC supports arbitrary precision integers, and that probably  
simplifies their internal hashing quite a bit.)

I have NOT exhaustively tested this new hash function.  While the  
improvement on this particular testcase is undeniable, I suspect that  
for every hash function there is a pathological testcase that behaves  
badly.  Ergo, I offer this hash to the list for comment: Is this a  
good choice?  Is "hash ^= hash6432shift(pVal[i]);" a good choice for  
hashing multi-word integers?  Is there a better hash I should use  
instead? Should I be solving the problem in another way entirely?

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: APInt.cpp.diffs.txt
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090311/9cf8f858/attachment.txt>
-------------- next part --------------



Thanks in advance for your comments,

stuart


More information about the llvm-dev mailing list