[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Mon Dec 7 15:27:23 PST 2015

On Mon, Dec 7, 2015 at 2:31 PM, Justin Lebar <jlebar at google.com> wrote:

> On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <gromer at google.com> wrote:
> > I'm confused. You start by saying that you're concerned that "this
> standard
> > [meaning P0029, I guess?] may ossify the state of the world for
> applications
> > which could benefit from using a different hash algorithm", but I don't
> see
> > how the proposal is any worse than the status quo with respect to the
> > problems you describe.
>
> It's not.
>
> My mail was more to express the position that switching out hashers is
> important for many applications.  It seems that perhaps if hashing is
> being reconsidered in a big way, the ability to switch out the hasher
> should be considered at the same time, lest an opportunity be lost.
> Even if switching out hashers is out of scope for P0029, it seems that
> if it's desirable and important it should be carefully considered, in
> the same way that heterogeneous lookups are being considered.
>

I see, that makes sense; I've made a note to discuss that in the next
version of the paper. I don't intend to actually propose it at this
juncture, but I agree it's worth considering

>
> > Furthermore, you go on to express a desire for user
> > code to be able to control the default hash algorithm, but if anything
> P0029
> > makes that easier, by giving you one central hash algorithm to control,
> > instead of dealing with hundreds of independent per-type hash algorithms.
>
> I suppose this is true from one perspective.  In particular it's
> easier to use a single hasher with complex types -- assuming that
> everyone writes their boilerplate using the templated idiom.
>

Sorry, I don't follow.

>
> On the other hand, the data structure using the hash -- unordered_map
> or whatever -- still controls which hasher we use.  My contention in
> the earlier e-mail is that because (a) people will largely use the
> default and (b) library writers are conservative, we will likely end
> up with possibly bad, probably slow standard-library hashers in the
> near future.  Just as replacing malloc is often a good idea, it seems
> to me that replacing the default hasher will also often be a good
> idea.  So I was asking that we consider whether it's desirable to
> design a system where it's easier to replace the default hasher.
>

I get all that. My point is that even if P0029 doesn't directly give you a
way to replace the hash algorithm, it gets you closer: under the status
quo, even if the standard library gave you a way to change the default
hasher, that would just give you a way to change the behavior for each type
individually. Under P0029, by contrast, if the standard library gives you a
way to change the hash algorithm, then you can change it for all types at a
stroke.

>
> Maybe this is the best we can reasonably do in that respect.  It would
> sort of be a bummer (as I understand the proposal), but I get that
> there's a lot of constraints on the problem.
>
> > I take no position on
> > whether it would be a good idea to put such a mechanism in the standard
> > itself, but if you want to go that way, surely gaining implementation
> > experience in libc++ would be a good first step?
>
> Sure, and I feel like this thread is part of that process?
>

Agreed; I had thought you were objecting to my request, but it sounds like
that's not the case.

>
> > On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <jlebar at google.com> wrote:
> >>
> >> Hi, I wanted to leave some thoughts here from my perspective as an
> >> application developer.  As context, I added heterogeneous lookup
> >> support to Google's internal dense_hash_{set,map}, and my day-to-day
> >> job involves working with lots of high-throughput hashtables.
> >>
> >> From my perspective, custom hash algorithms are pretty important, and
> >> I'm concerned that this standard may ossify the state of the world for
> >> applications which could benefit from using a different hash
> >> algorithm.
> >>
> >> Even if the standard library randomizes its hash (which I completely
> >> agree is a good idea), I contend that it's still going to be hard for
> >> standard library maintainers to swap out hash functions, because of
> >> the conservative nature of standard library writing.  (As evidence,
> >> consider that, last time I checked, glibc still used a malloc
> >> implementation whose throughput dropped by more than half as soon as
> >> you created a thread.)
> >>
> >> I expect that the result of this conservatism is that even if a good
> >> hash function is chosen today, it will rather quickly become outdated,
> >> as hashing is an area of active development in the community at large
> >> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's
> >> kind of the perfect nerdsnipe, and it also matters for lots of
> >> high-performance code.
> >>
> >> Moreover, applications don't necessarily control the version of the
> >> standard library they link with; even if one standard library
> >> implementation uses a good hash, that's no guarantee that they all
> >> will.  As soon as any one of the standard libraries I support uses a
> >> hash that's too slow or somehow weak (perhaps its randomization is
> >> weak and doesn't prevent flooding), I have to make nontrivial changes
> >> to all of my code.
> >>
> >> So I contend that "standard library writers should use a good, fast
> >> hash" implies "at least some standard libraries will use a possibly
> >> bad, probably slow hash" some time in the relatively near future, just
> >> as today some standard libraries use crappy malloc implementations.
> >> And so just as performance-sensitive applications commonly replace
> >> malloc, I expect many performance-sensitive applications will want to
> >> control their default hasher.
> >>
> >> Indeed, as the proposal says, "we must plan for hash algorithms that
> >> have a finite (and probably short) operational life," and my
> >> expectation is that there's a lot of carefully-maintained user code is
> >> going to better able to play this cat-and-mouse game than the compiler
> >> / standard library headers.
> >>
> >> One option is to say, fine, these applications can continue to use a
> >> custom hash functor just as they do today.  But as the proposal says,
> >> ergonomics and experience suggest that most people are just going to
> >> use the default.  Therefore my suggestion is, it would be really nice
> >> if there were an easy way for application developers to swap in a new
> >> hasher, even if it's only for a set of explicitly-listed types.
> >>
> >> I'll refrain from saying something dumb by not suggesting how this
> >> might be specified.  :)
> >>
> >> -Justin
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151207/4a268235/attachment.html>