OK here's a patch with the latest, including unit tests. I've also tried to make the comments clear that this is intended for the case of "generic" key types, where you are either hashing multiple data types together, or you don't know in advance what the data type will be - for cases such as strings where a tailored algorithm is available, you should use that instead of this.<div>
<br></div><div>There's a couple of things I don't like: First, there's too many levels of inlined function calls - my experience is that compilers aren't as good at inlining nested calls as is often advertised, however I couldn't figure out a way to get the same effect without the nested calls.</div>
<div><br></div><div>Similarly the addBitsImpl helper types are a little too complex and magical for my taste, but again I couldn't think of a better way to automatically detect whether to use the aligned vs. unaligned hashing routine.<br>
<br><div class="gmail_quote">On Wed, Feb 15, 2012 at 12:01 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com">clattner@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div class="im"><div>On Feb 14, 2012, at 10:47 PM, Talin wrote:</div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)"><span style="color:#000000"> </span>/// Add a pointer value</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(23,56,245)"><span style="color:#000000"> </span>template<span style="color:#000000"><</span>typename<span style="color:#000000"> T></span></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">const</span> T *PtrVal) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&PtrVal),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&PtrVal + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>This violates TBAA rules and looks pretty dangerous to expose as public API. Is this really needed? Also, addImpl is dereferencing the pointers as uint32_t's, but there is nothing that guarantees that T is a multiple of 4 bytes. The ArrayRef version has the same problem.</div>
<div><br></div></div></div></blockquote><div>So as far as hashing pointer values is concerned, I was just copying the code from FoldingSet. Since a lot of the keys that we're going to be dealing with are uniqued pointers, it makes sense to be able to calculate a hash of the bit-value of the pointer, rather than hashing the thing pointed to. That being said, renaming it to "addPointer" and adding a comment might be in order. Similarly, I can make the ArrayRef version 'addPointers' and have it take an ArrayRef<T*>.</div>
</div></blockquote><div><br></div></div><div>Ah, Jay was right, I misread this code!</div><div class="im"><div><br></div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div>Though it is more verbose, I think it would be better to expose a template specialization approach to getting the hash_value of T.</div><div><br></div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)">
<span style="color:#000000"> </span>/// Add a float</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">float</span> Data) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;min-height:13px"><br></div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)">
<span style="color:#000000"> </span>/// Add a double</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">double</span> Data) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>Similarly, these need to go through a union to avoid TBAA problems.</div><div><br></div></div></div></blockquote><div>I'm not sure how that works. Can you give an example?</div></div>
</blockquote><div><br></div></div><div>Just use:</div><div><br></div><div>void add(double Data) {</div><div> union {</div><div> double D; uint64_t I;</div><div> };</div><div> D = Data; </div><div> add(I);</div><div class="im">
<div>}</div><br><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><br></div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(StringRef StrVal) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(StrVal.data(), StrVal.size());</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>I'm contradicting my stance above about not caring about the implementation :), but is MurmurHash a good hash for string data? The Bernstein hash function works really well and is much cheaper to compute than Murmur. It is used by HashString (and thus by StringMap).</div>
<div><br></div></div></div></blockquote><div>So, MurmurHash is intended for blocks of arbitrary binary data, which may contain character data, integers, or whatever - it's designed to do such a thorough job of mixing the bits that it really doesn't matter what data types you feed it. You are correct that for purely string data, you'd want to use a less expensive algorithm (I'm partial to FNV-1, which is as cheap as the Bernstein hash and is AFAIK more mathematically sound.)</div>
</div></blockquote><br></div></div><div>Ok, so what's the answer? :) We can do different things for ArrayRef<char> and StringRef.</div><div><br></div><font color="#888888"><div>-Chris</div><div><br></div><div>
<br></div><div><br></div></font></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>-- Talin<br>
</div>