<div>// -- 'hash_code' class is an opaque type representing the hash code for some</div><div>// data. It is the intended product of hashing, and can be used to implement</div><div><br></div><div>vs.</div><div>
<br></div><div><div>// -- 'hash_value' is a function designed to be overloaded for each</div><div>// user-defined type which wishes to be used within a hashing context. It</div></div><div><br></div><div>The first paragraph has a hanging indent but the second and third don't.</div>
<div><br></div><div><div>// benchmarked at over 8.5 GiB/s for large keys, and <20 cycles/</div></div><div><br></div><div>Trailing punctuation.</div><div><br></div><div><div>#include <cassert></div><div>#include <cstring></div>
<div>#include <algorithm></div><div>#include <utility></div><div>#include <iterator></div></div><div><br></div><div>Sort.</div><div><br></div><div><div> /// \brief Form a hash code directly from a numerical value.</div>
<div> hash_code(size_t value) : value(value) {}</div></div><div><br></div><div>why not assert that the value is not the null or invalid value? I realize you'd need to have a private constructor for the null and invalid ones then.</div>
<div><br></div><div><div>// All of the implementation details of actually computing the various hash</div><div>// code values are held within this namespace. These routines are included in</div><div>// the header file mainly to allow inlining and constant propagation.</div>
<div>namespace hashing { namespace detail {</div></div><div><br></div><div>One namespace per line please, in LLVM.</div><div><br></div><div>[I only skimmed the implementation details of the hashing. It all looks plausible enough.]</div>
<div><br></div><div><div>} } // End hashing detail namespaces.</div></div><div><br></div><div>Again, one per line. You do this multiple times in Hashing.h, please fix them all.</div><div><br></div><div><div>/// reproduce *exactly* a specific behavior. To that end, we provide a function</div>
<div>/// which will forcible set the seed to a fixed value. This must be done at the</div></div><div><br></div><div>forcible -> forcibly</div><div><br></div><div><div> template <typename T> struct is_hashable_data</div>
<div> : integral_constant<bool, ((is_integral<T>::value ||</div><div> is_pointer<T>::value)</div><div> && 64 % sizeof(T) == 0)> {};</div>
</div><div><br></div><div>Optional: I object to the leading && and propose this reflowed text:</div><div><br></div><div><div> template <typename T> struct is_hashable_data</div><div> : integral_constant<bool,</div>
<div> ((is_integral<T>::value || is_pointer<T>::value) &&</div><div> 64 % sizeof(T) == 0)> {};</div></div><div>.</div><div><br></div><div><div>#if __has_feature(__cxx_variadic_templates__)</div>
</div><div><br></div><div>I'm pretty sure non-clang compilers will bark at that unless you #define'd __has_feature(X) to 0?</div><div><br></div><div><div>--- a/unittests/ADT/HashingTest.cpp</div><div>+++ b/unittests/ADT/HashingTest.cpp</div>
<div>@@ -13,45 +13,295 @@</div><div> </div><div> #include "gtest/gtest.h"</div><div> #include "llvm/ADT/Hashing.h"</div><div>+#include "llvm/Support/DataTypes.h"</div><div>+#include <vector></div>
<div>+#include <list></div><div>+#include <deque></div><div>+#include <map></div></div><div><br></div><div>Sort.</div><div><br></div><div><div>+ // Helper for test code to print hash codes.</div><div>+ void PrintTo(const hash_code &code, ::std::ostream *os) {</div>
</div><div><br></div><div>What's with the extra leading :: before std::?</div><div><br></div><div><div>+ namespace hashing { namespace detail {</div><div>+ template <> struct is_hashable_data<LargeTestInteger> : true_type {};</div>
<div>+ } } // End hashing detail namespace.</div><div>+} // End LLVM namespace.</div></div><div><br></div><div>A reminder about 1 namespace-per-line, but also the comments on ending namespaces are odd. They usually look like "} // end namespace llvm" or "} // namespace llvm", and I prefer the latter.</div>
<div><br></div><div>Nick</div><br><div class="gmail_quote">On 29 February 2012 01:35, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks for the feedback thus far!<br><br>I've updated the header file, and enclosed a complete patch against LLVM. This passes all the regtests, and I'll be doing more thorough testing of LLVM itself with the patch. I've included some basic unit tests, but could probably do more here... Not sure it's worth delaying the initial submission though, as the best testing is to use a hash testing suite...<br>
<br>I've addressed the comments from Nadav, but there may be more minor stylistic issues or typos. Please keep pointing them out! I appreciate the help there.<br><br>I've also corrected my stub for the execution-seed to be more correct and to include the ability to override it during program startup. It still doesn't go the final step to make it actually change on each execution, but is now otherwise identical. (In particular, it is no longer a compile-time constant that can be folded.) This regressed the performance a tiny bit, however...<br>
<br>There are several improvements to the implementation of the hashing algorithm itself. The way the hashing included the seed hurt performance quite a bit. I've re-worked it to be much lighter weight, and even after the execution seed fix above slowed things down, this speeds everything up more. We shave 4ns off the 4 to 8 byte case, bringing performance above that of essentially every high quality hashing algorithm...<br>
<br>I still think we can do more, but it's already much faster than the existing LLVM one except for the issue Tobias pointed out w/ modulo-4 key sizes. I'm going to investigate this, but it may be a consequence of a weakness in that algorithm.<br>
<br>I've re-run the SMHasher quality testing suite and the results are very good. There are a few problems identified, but these are long-known problems with CityHash in general, and have proven to thus far not be a cause of real-world issues... I'd like to fix them, but it doesn't seem a high priority yet.<br>
<br>Finally, I've run some rough initial numbers for x86 32-bit, and it's surprisingly good. The relative speeds of this algorithm and others don't seem to change much. A bit suspicious of that, so going to do more thorough analysis.
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br>