[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)

Wed Feb 29 21:22:17 PST 2012

//  -- 'hash_code' class is an opaque type representing the hash code for
some
//     data. It is the intended product of hashing, and can be used to
implement

vs.

//  -- 'hash_value' is a function designed to be overloaded for each
//  user-defined type which wishes to be used within a hashing context. It

The first paragraph has a hanging indent but the second and third don't.

//  benchmarked at over 8.5 GiB/s for large keys, and <20 cycles/

Trailing punctuation.

#include <cassert>
#include <cstring>
#include <algorithm>
#include <utility>
#include <iterator>

Sort.

  /// \brief Form a hash code directly from a numerical value.
  hash_code(size_t value) : value(value) {}

why not assert that the value is not the null or invalid value? I realize
you'd need to have a private constructor for the null and invalid ones then.

// All of the implementation details of actually computing the various hash
// code values are held within this namespace. These routines are included
in
// the header file mainly to allow inlining and constant propagation.
namespace hashing { namespace detail {

One namespace per line please, in LLVM.

[I only skimmed the implementation details of the hashing. It all looks
plausible enough.]

} } // End hashing detail namespaces.

Again, one per line. You do this multiple times in Hashing.h, please fix
them all.

/// reproduce *exactly* a specific behavior. To that end, we provide a
function
/// which will forcible set the seed to a fixed value. This must be done at
the

forcible -> forcibly

  template <typename T> struct is_hashable_data
    : integral_constant<bool, ((is_integral<T>::value ||
                                is_pointer<T>::value)
                               && 64 % sizeof(T) == 0)> {};

Optional: I object to the leading && and propose this reflowed text:

  template <typename T> struct is_hashable_data
    : integral_constant<bool,
                        ((is_integral<T>::value || is_pointer<T>::value) &&
                         64 % sizeof(T) == 0)> {};
.

#if __has_feature(__cxx_variadic_templates__)

I'm pretty sure non-clang compilers will bark at that unless you #define'd
__has_feature(X) to 0?

--- a/unittests/ADT/HashingTest.cpp
+++ b/unittests/ADT/HashingTest.cpp
@@ -13,45 +13,295 @@

 #include "gtest/gtest.h"
 #include "llvm/ADT/Hashing.h"
+#include "llvm/Support/DataTypes.h"
+#include <vector>
+#include <list>
+#include <deque>
+#include <map>

Sort.

+  // Helper for test code to print hash codes.
+  void PrintTo(const hash_code &code, ::std::ostream *os) {

What's with the extra leading :: before std::?

+  namespace hashing { namespace detail {
+  template <> struct is_hashable_data<LargeTestInteger> : true_type {};
+  } } // End hashing detail namespace.
+} // End LLVM namespace.

A reminder about 1 namespace-per-line, but also the comments on ending
namespaces are odd. They usually look like "} // end namespace llvm" or "}
 // namespace llvm", and I prefer the latter.

Nick

On 29 February 2012 01:35, Chandler Carruth <chandlerc at google.com> wrote:

> Thanks for the feedback thus far!
>
> I've updated the header file, and enclosed a complete patch against LLVM.
> This passes all the regtests, and I'll be doing more thorough testing of
> LLVM itself with the patch. I've included some basic unit tests, but could
> probably do more here... Not sure it's worth delaying the initial
> submission though, as the best testing is to use a hash testing suite...
>
> I've addressed the comments from Nadav, but there may be more minor
> stylistic issues or typos. Please keep pointing them out! I appreciate the
> help there.
>
> I've also corrected my stub for the execution-seed to be more correct and
> to include the ability to override it during program startup. It still
> doesn't go the final step to make it actually change on each execution, but
> is now otherwise identical. (In particular, it is no longer a compile-time
> constant that can be folded.) This regressed the performance a tiny bit,
> however...
>
> There are several improvements to the implementation of the hashing
> algorithm itself. The way the hashing included the seed hurt performance
> quite a bit. I've re-worked it to be much lighter weight, and even after
> the execution seed fix above slowed things down, this speeds everything up
> more. We shave 4ns off the 4 to 8 byte case, bringing performance above
> that of essentially every high quality hashing algorithm...
>
> I still think we can do more, but it's already much faster than the
> existing LLVM one except for the issue Tobias pointed out w/ modulo-4 key
> sizes. I'm going to investigate this, but it may be a consequence of a
> weakness in that algorithm.
>
> I've re-run the SMHasher quality testing suite and the results are very
> good. There are a few problems identified, but these are long-known
> problems with CityHash in general, and have proven to thus far not be a
> cause of real-world issues... I'd like to fix them, but it doesn't seem a
> high priority yet.
>
> Finally, I've run some rough initial numbers for x86 32-bit, and it's
> surprisingly good. The relative speeds of this algorithm and others don't
> seem to change much. A bit suspicious of that, so going to do more thorough
> analysis.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120229/cd68bf58/attachment.html>