[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Mon Dec 7 14:31:21 PST 2015

On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <gromer at google.com> wrote:
> I'm confused. You start by saying that you're concerned that "this standard
> [meaning P0029, I guess?] may ossify the state of the world for applications
> which could benefit from using a different hash algorithm", but I don't see
> how the proposal is any worse than the status quo with respect to the
> problems you describe.

It's not.

My mail was more to express the position that switching out hashers is
important for many applications.  It seems that perhaps if hashing is
being reconsidered in a big way, the ability to switch out the hasher
should be considered at the same time, lest an opportunity be lost.
Even if switching out hashers is out of scope for P0029, it seems that
if it's desirable and important it should be carefully considered, in
the same way that heterogeneous lookups are being considered.

> Furthermore, you go on to express a desire for user
> code to be able to control the default hash algorithm, but if anything P0029
> makes that easier, by giving you one central hash algorithm to control,
> instead of dealing with hundreds of independent per-type hash algorithms.

I suppose this is true from one perspective.  In particular it's
easier to use a single hasher with complex types -- assuming that
everyone writes their boilerplate using the templated idiom.

On the other hand, the data structure using the hash -- unordered_map
or whatever -- still controls which hasher we use.  My contention in
the earlier e-mail is that because (a) people will largely use the
default and (b) library writers are conservative, we will likely end
up with possibly bad, probably slow standard-library hashers in the
near future.  Just as replacing malloc is often a good idea, it seems
to me that replacing the default hasher will also often be a good
idea.  So I was asking that we consider whether it's desirable to
design a system where it's easier to replace the default hasher.

Maybe this is the best we can reasonably do in that respect.  It would
sort of be a bummer (as I understand the proposal), but I get that
there's a lot of constraints on the problem.

> I take no position on
> whether it would be a good idea to put such a mechanism in the standard
> itself, but if you want to go that way, surely gaining implementation
> experience in libc++ would be a good first step?

Sure, and I feel like this thread is part of that process?

> On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <jlebar at google.com> wrote:
>>
>> Hi, I wanted to leave some thoughts here from my perspective as an
>> application developer.  As context, I added heterogeneous lookup
>> support to Google's internal dense_hash_{set,map}, and my day-to-day
>> job involves working with lots of high-throughput hashtables.
>>
>> From my perspective, custom hash algorithms are pretty important, and
>> I'm concerned that this standard may ossify the state of the world for
>> applications which could benefit from using a different hash
>> algorithm.
>>
>> Even if the standard library randomizes its hash (which I completely
>> agree is a good idea), I contend that it's still going to be hard for
>> standard library maintainers to swap out hash functions, because of
>> the conservative nature of standard library writing.  (As evidence,
>> consider that, last time I checked, glibc still used a malloc
>> implementation whose throughput dropped by more than half as soon as
>> you created a thread.)
>>
>> I expect that the result of this conservatism is that even if a good
>> hash function is chosen today, it will rather quickly become outdated,
>> as hashing is an area of active development in the community at large
>> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's
>> kind of the perfect nerdsnipe, and it also matters for lots of
>> high-performance code.
>>
>> Moreover, applications don't necessarily control the version of the
>> standard library they link with; even if one standard library
>> implementation uses a good hash, that's no guarantee that they all
>> will.  As soon as any one of the standard libraries I support uses a
>> hash that's too slow or somehow weak (perhaps its randomization is
>> weak and doesn't prevent flooding), I have to make nontrivial changes
>> to all of my code.
>>
>> So I contend that "standard library writers should use a good, fast
>> hash" implies "at least some standard libraries will use a possibly
>> bad, probably slow hash" some time in the relatively near future, just
>> as today some standard libraries use crappy malloc implementations.
>> And so just as performance-sensitive applications commonly replace
>> malloc, I expect many performance-sensitive applications will want to
>> control their default hasher.
>>
>> Indeed, as the proposal says, "we must plan for hash algorithms that
>> have a finite (and probably short) operational life," and my
>> expectation is that there's a lot of carefully-maintained user code is
>> going to better able to play this cat-and-mouse game than the compiler
>> / standard library headers.
>>
>> One option is to say, fine, these applications can continue to use a
>> custom hash functor just as they do today.  But as the proposal says,
>> ergonomics and experience suggest that most people are just going to
>> use the default.  Therefore my suggestion is, it would be really nice
>> if there were an easy way for application developers to swap in a new
>> hasher, even if it's only for a set of explicitly-listed types.
>>
>> I'll refrain from saying something dumb by not suggesting how this
>> might be specified.  :)
>>
>> -Justin
>
>