<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>On Feb 14, 2012, at 10:47 PM, Talin wrote:</div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; "><div style="word-wrap:break-word"><div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)"><span style="color:#000000"> </span>/// Add a pointer value</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(23,56,245)"><span style="color:#000000"> </span>template<span style="color:#000000"><</span>typename<span style="color:#000000"> T></span></div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">const</span> T *PtrVal) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&PtrVal),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&PtrVal + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>This violates TBAA rules and looks pretty dangerous to expose as public API. Is this really needed? Also, addImpl is dereferencing the pointers as uint32_t's, but there is nothing that guarantees that T is a multiple of 4 bytes. The ArrayRef version has the same problem.</div>
<div><br></div></div></div></blockquote><div>So as far as hashing pointer values is concerned, I was just copying the code from FoldingSet. Since a lot of the keys that we're going to be dealing with are uniqued pointers, it makes sense to be able to calculate a hash of the bit-value of the pointer, rather than hashing the thing pointed to. That being said, renaming it to "addPointer" and adding a comment might be in order. Similarly, I can make the ArrayRef version 'addPointers' and have it take an ArrayRef<T*>.</div></div></blockquote><div><br></div><div>Ah, Jay was right, I misread this code!</div><div><br></div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; "><div style="word-wrap:break-word"><div><div>Though it is more verbose, I think it would be better to expose a template specialization approach to getting the hash_value of T.</div><div><br></div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)">
<span style="color:#000000"> </span>/// Add a float</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">float</span> Data) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;min-height:13px"><br></div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo;color:rgb(0,143,41)">
<span style="color:#000000"> </span>/// Add a double</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(<span style="color:#1738f5">double</span> Data) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data),</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
<span style="color:#1738f5">reinterpret_cast</span><<span style="color:#1738f5">const</span> uint32_t *>(&Data + 1));</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>Similarly, these need to go through a union to avoid TBAA problems.</div><div><br></div></div></div></blockquote><div>I'm not sure how that works. Can you give an example?</div></div></blockquote><div><br></div><div>Just use:</div><div><br></div><div>void add(double Data) {</div><div> union {</div><div> double D; uint64_t I;</div><div> };</div><div> D = Data; </div><div> add(I);</div><div>}</div><br><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; ">
<div style="word-wrap:break-word"><div><div><br></div><div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> <span style="color:#1738f5">void</span> add(StringRef StrVal) {</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo"> addImpl(StrVal.data(), StrVal.size());</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font:normal normal normal 11px/normal Menlo">
}</div></div><div><br></div><div>I'm contradicting my stance above about not caring about the implementation :), but is MurmurHash a good hash for string data? The Bernstein hash function works really well and is much cheaper to compute than Murmur. It is used by HashString (and thus by StringMap).</div>
<div><br></div></div></div></blockquote><div>So, MurmurHash is intended for blocks of arbitrary binary data, which may contain character data, integers, or whatever - it's designed to do such a thorough job of mixing the bits that it really doesn't matter what data types you feed it. You are correct that for purely string data, you'd want to use a less expensive algorithm (I'm partial to FNV-1, which is as cheap as the Bernstein hash and is AFAIK more mathematically sound.)</div></div></blockquote><br></div><div>Ok, so what's the answer? :) We can do different things for ArrayRef<char> and StringRef.</div><div><br></div><div>-Chris</div><div><br></div><div><br></div><div><br></div></body></html>