<div dir="ltr">A couple of minor comments about the details of the patch:<div><br></div><div><div>- hash_state state = state.create(s_begin, seed);</div><div>+ hash_state state = hash_state::create(s_begin, seed);</div></div>
<div><br></div><div>Why this change? The existing code is correct already I think?</div><div><br></div><div><div>- return ::llvm::hashing::detail::hash_integer_value(value);</div><div>+ // the cast is necessary for strong enums and avoids compiler warnings</div>
<div><br></div><div>Generally LLVM comments should be formed as prose (starting with a capitalized word and ending with appropriate punctuation). I don't think you need a comment here though, or a separate variable. I would just explicitly cast the value in the argument.</div>
<div><br></div><div>+ const uint64_t n = static_cast<uint64_t>(value);</div><div>+ return ::llvm::hashing::detail::hash_integer_value(n);</div></div><div><br></div><div><br></div></div><div class="gmail_extra"><br>
<br><div class="gmail_quote">On Mon, Feb 3, 2014 at 1:46 PM, Stephan Tolksdorf <span dir="ltr"><<a href="mailto:st@quanttec.com" target="_blank">st@quanttec.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 02.02.14 00:48, Chandler Carruth wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
On Sat, Feb 1, 2014 at 8:35 AM, Stephan Tolksdorf <<a href="mailto:st@quanttec.com" target="_blank">st@quanttec.com</a><br></div><div><div class="h5">
<mailto:<a href="mailto:st@quanttec.com" target="_blank">st@quanttec.com</a>>> wrote:<br>
<br>
Hi,<br>
<br>
Currently the hashing implementation in ADT/Hashing.h produces hash<br>
values on 32-bit platforms that differ from the lower 32-bits of the<br>
hash values produced on 64-bit platforms. It seems the only reason<br>
for this difference is that the uint64_t integer seed is truncated<br>
to size_t. Since the usage of uint64_t and size_t as types for seed<br>
values in the implementation is somewhat inconsistent, I'm<br>
wondering: Was this difference deliberately introduced as a<br>
performance optimization for 32-bit code, or is this maybe just an<br>
implementation artifact?<br>
<br>
I also noted that the implementation relies on implicit conversions<br>
from uint64_t to size_t in several places where hash_code values are<br>
returned, which may trigger compiler warnings on 32-bit platforms.<br>
<br>
<br>
Almost all of this was largely incidental on my part. I didn't spend a<br>
lot of time tuning the 32-bit platform performance. If you have ideas<br>
that you think would make things better, I'm certainly interested. =D<br>
<br>
One thing I would note is that I would prefer to retain the collision<br>
resistance where possible. So I would be leery of reducing the internal<br>
state used. I would more focus on making the outputs friendly for 32-bit<br>
platforms and the inputs consistent.<br>
</div></div></blockquote>
<br>
I've attached a patch that makes the hashing implementation consistently use a 64-bit seed everywhere. With this change the implementation should produce the same hash codes modulo SIZE_MAX + 1 for identical values independent of the platform. (Though hash_combine may still produce different results if the size of an argument type differs between platforms). I suspect the negative performance impact on 32-bit platforms should be small, but I didn't do any benchmarking. With atomics one could probably replace the thread safe local static in get_execution_seed with something that has a little less overhead.<br>
<br>
The patch also removes a FIXME from the set_fixed_execution_seed implementation and rewords the documentation string a bit, since using this function in a multi-threaded program requires some kind of external synchronization anyway.<br>
<br>
Another attached patch implements llvm::sys::Process::<u></u>GetRandomNumber() on Windows. (There's already a Unix implementation.) This function could be useful for implementing the missing random per-execution seed functionality. (Though it might take a little care to deal with Process::GetRandomNumberSeed potentially calling back into hash_combine on Unix.)<br>
<br>
In a little experiment I made, changing the seed per execution seemed to lead to test suite errors for clang's PCH support and other components.<span class="HOEnZb"><font color="#888888"><br>
<br>
- Stephan<br>
<br>
</font></span></blockquote></div><br></div>