[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Sun Dec 6 22:05:45 PST 2015

Hi, I wanted to leave some thoughts here from my perspective as an
application developer.  As context, I added heterogeneous lookup
support to Google's internal dense_hash_{set,map}, and my day-to-day
job involves working with lots of high-throughput hashtables.

>From my perspective, custom hash algorithms are pretty important, and
I'm concerned that this standard may ossify the state of the world for
applications which could benefit from using a different hash
algorithm.

Even if the standard library randomizes its hash (which I completely
agree is a good idea), I contend that it's still going to be hard for
standard library maintainers to swap out hash functions, because of
the conservative nature of standard library writing.  (As evidence,
consider that, last time I checked, glibc still used a malloc
implementation whose throughput dropped by more than half as soon as
you created a thread.)

I expect that the result of this conservatism is that even if a good
hash function is chosen today, it will rather quickly become outdated,
as hashing is an area of active development in the community at large
(see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's
kind of the perfect nerdsnipe, and it also matters for lots of
high-performance code.

Moreover, applications don't necessarily control the version of the
standard library they link with; even if one standard library
implementation uses a good hash, that's no guarantee that they all
will.  As soon as any one of the standard libraries I support uses a
hash that's too slow or somehow weak (perhaps its randomization is
weak and doesn't prevent flooding), I have to make nontrivial changes
to all of my code.

So I contend that "standard library writers should use a good, fast
hash" implies "at least some standard libraries will use a possibly
bad, probably slow hash" some time in the relatively near future, just
as today some standard libraries use crappy malloc implementations.
And so just as performance-sensitive applications commonly replace
malloc, I expect many performance-sensitive applications will want to
control their default hasher.

Indeed, as the proposal says, "we must plan for hash algorithms that
have a finite (and probably short) operational life," and my
expectation is that there's a lot of carefully-maintained user code is
going to better able to play this cat-and-mouse game than the compiler
/ standard library headers.

One option is to say, fine, these applications can continue to use a
custom hash functor just as they do today.  But as the proposal says,
ergonomics and experience suggest that most people are just going to
use the default.  Therefore my suggestion is, it would be really nice
if there were an easy way for application developers to swap in a new
hasher, even if it's only for a set of explicitly-listed types.

I'll refrain from saying something dumb by not suggesting how this
might be specified.  :)

-Justin