On Mon, Feb 13, 2012 at 1:22 AM, Jay Foad <span dir="ltr"><<a href="mailto:jay.foad@gmail.com">jay.foad@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">On 13 February 2012 00:59, Talin <<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>> wrote:<br>

> Here's my latest version of Hashing.h, which I propose to add to llvm/ADT.<br>

> Comments welcome and encouraged.<br>

<br>

</div>> /// Adapted from MurmurHash2 by Austin Appleby<br>

<br>

Just out of curiosity, why not MurmurHash3 ? This page seems to<br>

suggest that #2 has some flaw, and #3 is better all round:<br>

<br>

<a href="https://sites.google.com/site/murmurhash/" target="_blank">https://sites.google.com/site/murmurhash/</a><br>

<br></blockquote><div>The main reason is because there's no incremental version of 3. If you look at the source, you'll notice that the very first thing that 3 does is Hash ^= Length, but for the incremental case we don't know the length until we're done. Also, 2 has fewer instructions per block hashed than 3; 3 also requires bit rotations, whereas 2 only uses bit shifts.</div>


<div><br></div><div>Bear in mind that the "flaw" in MurmurHash2 is a fairly esoteric case which (I suspect) LLVM is unlikely to ever encounter in practice. Austin is trying to develop the best possible hash for a wide range of key probability distributions, and his testing methodologies are quite strict.</div>


<div><br></div><div>LLVM's needs, on the other hand, are fairly modest. I'm guessing that most DenseMaps contain less than a few thousand entries. Even a bad hash function wouldn't hurt performance that much, and the time taken to calculate the hash is probably more of a factor in overall performance than getting the optimum distribution of hash values.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Would it be possible to use CityHash instead for strings?<br>

<br>

<a href="http://code.google.com/p/cityhash/" target="_blank">http://code.google.com/p/cityhash/</a><br>

<br></blockquote><div>OK by me. My intention is that "Hashing.h" should contain a variety of different hashing algorithms for various specific needs. Right now, I am mainly focusing on hashing of mixed data types - that is, you have some structure which contains pointers, ints, strings, and you want to calculate a hash of the entire struct. I need this because I'm working on improving the uniquing of constants and similar data structures. My next target is to improve the uniquing of MDNodes, but I want to get the hashing stuff squared away first.</div>


<div><br></div><div>It is also my intent that some person who is more of an expert in hashing than I am can do detailed performance analysis under real-world loads (such as compiling actual programs with clang) and tweak and optimize the hashing algorithm, without needing to modify the API and/or all of the places that call it.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks,<br>

<font color="#888888">Jay.<br>

</font></blockquote></div><br>-- <br>-- Talin<br>