[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)

Wed Feb 29 03:47:39 PST 2012

> (But if we only care about
> running on x86-64, this won't be a problem.)

Please, no. We have a cortex-a9 native buildbot already in lab.llvm.org and
as manufacturers emit faster ARM chips we (ARM) will want to have LLVM run
native on them.

You've also got the OpenCL use case etc.

Please bear ARM in mind.

Cheers,

James

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Jay Foad
Sent: 29 February 2012 10:52
To: Chandler Carruth
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Proposed implementation of N3333 hashing interfaces
for LLVM (and possible libc++)

On 29 February 2012 09:35, Chandler Carruth <chandlerc at google.com> wrote:
> I still think we can do more, but it's already much faster than the
existing LLVM one except for the issue Tobias pointed out w/ modulo-4 key
sizes. I'm going to investigate this

OK, but this is a VERY big exception! Almost any non-string data
anyone wants to hash will be a multiple of 4 bytes in length.

> //===-- llvm/ADT/Hashing.h - Utilities for hashing --------------*- C++
-*-===//

Doesn't this belong in ../Support/ ? No-one's ever explained the
difference to me; I just assumed that generic data types go in ADT/
and everything else goes in Support/.

> //
> //                     The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open Source
> // License. See LICENSE.TXT for details.
> //
>
//===----------------------------------------------------------------------=
==//
> //
> // This file implements the newly proposed standard C++ interfaces for
hashing
> // arbitrary data and building hash functions for user-defined types. This
> // interface was originally proposed in N3333[1] and is currently under
review
> // for inclusion in a future TR and/or standard.
> //
> // The primary interfaces provide are comprised of one type and three
functions:

"provided"

> //
> //  -- 'hash_code' class is an opaque type representing the hash code for
some
> //     data. It is the intended product of hashing, and can be used to
implement
> //     hash tables, checksumming, and other common uses of hashes. It is
not an
> //     integer type (although it can be converted to one) because it is
risky
> //     to assume much about the internals of a hash_code. In particular,
each
> //     execution of the program has a high probability of producing a
different
> //     hash_code for a given input. Thus their values are not stable to
save or
> //     persist, and should only be used during the execution for the
> //     construction of hashing datastructures.
> //
> //  -- 'hash_value' is a function designed to be overloaded for each
> //  user-defined type which wishes to be used within a hashing context. It
> //  should be overloaded within the user-defined type's namespace and
found via
> //  ADL. Overloads for primitive types are provided by this library.
> //
> //  -- 'hash_combine' and 'hash_combine_range' are functions designed to
aid
> //  programmers in easily and intuitively combining a set of data into a
single
> //  hash_code for their object. They should only logically be used within
the
> //  implementation of a 'hash_value' routine or similar context.
> //
> //  Note that 'hash_combine_range' contains very special logic for hashing
> //  a contiguous array of integers or pointers. This logic is *extremely*
fast,
> //  on a modern Intel "Gainestown" Xeon (Nehalem uarch) @2.2 GHz, these
were
> //  benchmarked at over 8.5 GiB/s for large keys, and <20 cycles/

20 cycles per what? Don't keep us in suspense!

> //
>
//===----------------------------------------------------------------------=
==//
>
> #ifndef LLVM_ADT_HASHING_H
> #define LLVM_ADT_HASHING_H
>
> #include "llvm/ADT/STLExtras.h"
> #include "llvm/Support/DataTypes.h"
> #include "llvm/Support/type_traits.h"
> #include <cassert>
> #include <cstring>
> #include <algorithm>
> #include <utility>
> #include <iterator>
>
> namespace llvm {
>
> /// \brief An opaque object representing a hash code.
> ///
> /// This object represents the result of hashing some entity. It is
intended to
> /// be used to implement hashtables or other hashing-based data
structures.
> /// While it wraps and exposes a numeric value, this value should not be
> /// trusted to be stable or predictable across processes or executions.
> ///
> /// In order to obtain the hash_code for an object 'x':
> /// \code
> ///   using llvm::hash_value;
> ///   llvm::hash_code code = hash_value(x);
> /// \endcode
> ///
> /// Also note that there are two numerical values which are reserved, and
the
> /// implementation ensures will never be produced for real hash_codes.
These
> /// can be used as sentinels within hashing data structures.
> class hash_code {
>   size_t value;
>
> public:
>   /// \brief Default construct a hash_code. Constructs a null code.
>   hash_code() : value() {}
>
>   /// \brief Form a hash code directly from a numerical value.
>   hash_code(size_t value) : value(value) {}
>
>   /// \brief Convert the hash code to its numerical value for use.
>   /*explicit*/ operator size_t() const { return value; }
>
>   /// \brief Get a hash_code object which corresponds to a null code.
>   ///
>   /// The null code must never be the result of any 'hash_value' calls and
can
>   /// be used to detect an unset hash_code.
>   static hash_code get_null_code() { return 0; }
>
>   /// \brief Get a hash_code object which corresponds to an invalid code.
>   ///
>   /// The invalid code must never be the result of any 'hash_value' calls.
This
>   /// can be used to flag invalid hash_codes or mark entries in a hash
table.
>   static hash_code get_invalid_code() { return -1; }

I'm sure there's a compiler out there just waiting to warn you that
you're implicitly converting -1 to an unsigned type. :-)

>   friend bool operator==(const hash_code &lhs, const hash_code &rhs) {
>     return lhs.value == rhs.value;
>   }
>   friend bool operator!=(const hash_code &lhs, const hash_code &rhs) {
>     return lhs.value != rhs.value;
>   }
>
>   /// \brief Allow a hash_code to be directly run through hash_value.
>   friend size_t hash_value(const hash_code &code) { return code.value; }
> };
>
>
> // All of the implementation details of actually computing the various
hash
> // code values are held within this namespace. These routines are included
in
> // the header file mainly to allow inlining and constant propagation.
> namespace hashing { namespace detail {

My only comment on the implementation is that, on CPUs that have
different instruction sequences for aligned and unaligned loads, you
want to be careful that the 8-byte memcpys turn into aligned loads
when hashing naturally aligned data. (But if we only care about
running on x86-64, this won't be a problem.)

Jay.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev