[PATCH] InstrProf: Calculate a better function hash

Mon Mar 24 16:55:08 PDT 2014

On Mar 24, 2014, at 2:10 PM, Chandler Carruth <chandlerc at google.com> wrote:

> +void MapRegionCounters::combineHash(ASTHash Val) {
> +  // Use Bernstein's hash.
> +  Hash = Hash * 33 + Val;
> +}
> 
> So, you care a lot about collisions, but you use Bernstein's hash? I don't get it. This will collide like crazy. Note that Bernstein's hashing algorithm is only ever a "good" distribution for hashing ascii-ish text, and you're hashing dense things.

This will have fewer collisions than:

   ++Hash; // What we’re effectively doing now.

Bernstein *is* weak.  But we’re not using this for a hash table.  In a hash table, you need hashes to be well-distributed, because you need few collisions of Hash%Size for arbitrary Size.

On the contrary, this code:

  - compares the full 64-bits,
  - checks separately that the (mangled) function name matches, and
  - checks separately that the number of counters matches.

It seems improbable to me that we’ll get a lot of collisions in this use case.

> If you care about collisions, I would use the high 64 bits of an md5 (or better yet, all 128 bits). Or some other well researched and understood algorithm.

MD5 looks way too slow (!).  Are you suggesting I should import SpookyHash [1] or CityHash [2]?  They seem like overkill for this use case, but I’m definitely not a hashing expert.

Another idea is to use a simple checksum algorithm, like a variation on Fletcher’s [3].  Any thoughts on that?

[1]: http://burtleburtle.net/bob/hash/spooky.html
[2]: https://code.google.com/p/cityhash/
[3]: http://en.wikipedia.org/wiki/Fletcher%27s_checksum