<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">How about an FNV hash? That is very simple to implement, fast, and will be stronger at detecting changes.</div><div class="gmail_default" style="font-family:verdana,sans-serif">


<br></div><div class="gmail_default" style="font-family:verdana,sans-serif">Should the hashing computation be split from PGO into its own utility? Having a general hashing for functions may have other uses; in particular MergeFunc comes to mind.</div>

<div class="gmail_default" style="font-family:verdana,sans-serif"><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 24, 2014 at 4:55 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=""><br>

On Mar 24, 2014, at 2:10 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:<br>

<br>

> +void MapRegionCounters::combineHash(ASTHash Val) {<br>

> +  // Use Bernstein's hash.<br>

> +  Hash = Hash * 33 + Val;<br>

> +}<br>

><br>

> So, you care a lot about collisions, but you use Bernstein's hash? I don't get it. This will collide like crazy. Note that Bernstein's hashing algorithm is only ever a "good" distribution for hashing ascii-ish text, and you're hashing dense things.<br>


<br>

</div>This will have fewer collisions than:<br>

<br>

   ++Hash; // What we’re effectively doing now.<br>

<br>

Bernstein *is* weak.  But we’re not using this for a hash table.  In a hash table, you need hashes to be well-distributed, because you need few collisions of Hash%Size for arbitrary Size.<br>

<br>

On the contrary, this code:<br>

<br>

  - compares the full 64-bits,<br>

  - checks separately that the (mangled) function name matches, and<br>

  - checks separately that the number of counters matches.<br>

<br>

It seems improbable to me that we’ll get a lot of collisions in this use case.<br>

<div class=""><br>

> If you care about collisions, I would use the high 64 bits of an md5 (or better yet, all 128 bits). Or some other well researched and understood algorithm.<br>

<br>

</div>MD5 looks way too slow (!).  Are you suggesting I should import SpookyHash [1] or CityHash [2]?  They seem like overkill for this use case, but I’m definitely not a hashing expert.<br>

<br>

Another idea is to use a simple checksum algorithm, like a variation on Fletcher’s [3].  Any thoughts on that?<br>

<br>

[1]: <a href="http://burtleburtle.net/bob/hash/spooky.html" target="_blank">http://burtleburtle.net/bob/hash/spooky.html</a><br>

[2]: <a href="https://code.google.com/p/cityhash/" target="_blank">https://code.google.com/p/cityhash/</a><br>

[3]: <a href="http://en.wikipedia.org/wiki/Fletcher%27s_checksum" target="_blank">http://en.wikipedia.org/wiki/Fletcher%27s_checksum</a><br>

_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div><font size="4" face="arial black, sans-serif" style="background-color:rgb(0,0,0)" color="#b45f06"> Raúl E. Silvera </font></div><div><br></div>

</div>

</div>