[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Mon Dec 7 10:36:53 PST 2015

I'm confused. You start by saying that you're concerned that "this standard
[meaning P0029, I guess?] may ossify the state of the world for
applications which could benefit from using a different hash algorithm",
but I don't see how the proposal is any worse than the status quo with
respect to the problems you describe. Furthermore, you go on to express a
desire for user code to be able to control the default hash algorithm, but
if anything P0029 makes that easier, by giving you one central hash
algorithm to control, instead of dealing with hundreds of independent
per-type hash algorithms.

In any event, "an easy way for application developers to swap in a new
hasher" is basically exactly what I'm asking for here. I take no position
on whether it would be a good idea to put such a mechanism in the standard
itself, but if you want to go that way, surely gaining implementation
experience in libc++ would be a good first step?

On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <jlebar at google.com> wrote:

> Hi, I wanted to leave some thoughts here from my perspective as an
> application developer.  As context, I added heterogeneous lookup
> support to Google's internal dense_hash_{set,map}, and my day-to-day
> job involves working with lots of high-throughput hashtables.
>
> From my perspective, custom hash algorithms are pretty important, and
> I'm concerned that this standard may ossify the state of the world for
> applications which could benefit from using a different hash
> algorithm.
>
> Even if the standard library randomizes its hash (which I completely
> agree is a good idea), I contend that it's still going to be hard for
> standard library maintainers to swap out hash functions, because of
> the conservative nature of standard library writing.  (As evidence,
> consider that, last time I checked, glibc still used a malloc
> implementation whose throughput dropped by more than half as soon as
> you created a thread.)
>
> I expect that the result of this conservatism is that even if a good
> hash function is chosen today, it will rather quickly become outdated,
> as hashing is an area of active development in the community at large
> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's
> kind of the perfect nerdsnipe, and it also matters for lots of
> high-performance code.
>
> Moreover, applications don't necessarily control the version of the
> standard library they link with; even if one standard library
> implementation uses a good hash, that's no guarantee that they all
> will.  As soon as any one of the standard libraries I support uses a
> hash that's too slow or somehow weak (perhaps its randomization is
> weak and doesn't prevent flooding), I have to make nontrivial changes
> to all of my code.
>
> So I contend that "standard library writers should use a good, fast
> hash" implies "at least some standard libraries will use a possibly
> bad, probably slow hash" some time in the relatively near future, just
> as today some standard libraries use crappy malloc implementations.
> And so just as performance-sensitive applications commonly replace
> malloc, I expect many performance-sensitive applications will want to
> control their default hasher.
>
> Indeed, as the proposal says, "we must plan for hash algorithms that
> have a finite (and probably short) operational life," and my
> expectation is that there's a lot of carefully-maintained user code is
> going to better able to play this cat-and-mouse game than the compiler
> / standard library headers.
>
> One option is to say, fine, these applications can continue to use a
> custom hash functor just as they do today.  But as the proposal says,
> ergonomics and experience suggest that most people are just going to
> use the default.  Therefore my suggestion is, it would be really nice
> if there were an easy way for application developers to swap in a new
> hasher, even if it's only for a set of explicitly-listed types.
>
> I'll refrain from saying something dumb by not suggesting how this
> might be specified.  :)
>
> -Justin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151207/cd3e7297/attachment.html>