[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Tue Dec 8 09:53:28 PST 2015

On Mon, Dec 7, 2015 at 6:50 PM, Justin Lebar <jlebar at google.com> wrote:

> > I've made a note to discuss that in the next version of the paper. I
> don't intend to actually propose it at this juncture, but I agree it's
> worth considering
>
> Fantastic, thank you!
>
> > Under P0029, by contrast, if the standard library gives you a way to
> change the hash algorithm, then you can change it for all types at a stroke.
>
> Strong agree.
>
> > I had thought you were objecting to my request, but it sounds like
> that's not the case.
>
> Sorry, there clearly are communication nuances on this list that I'm
> not yet hep to.  Thanks for being patient and digging out what I was
> trying to say.
>

I'm new here too, so this may just be down to my lack of reading
comprehension skils.

>
> -Justin
>
> On Mon, Dec 7, 2015 at 3:27 PM, Geoffrey Romer <gromer at google.com> wrote:
> >
> > On Mon, Dec 7, 2015 at 2:31 PM, Justin Lebar <jlebar at google.com> wrote:
> >>
> >> On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <gromer at google.com>
> wrote:
> >> > I'm confused. You start by saying that you're concerned that "this
> >> > standard
> >> > [meaning P0029, I guess?] may ossify the state of the world for
> >> > applications
> >> > which could benefit from using a different hash algorithm", but I
> don't
> >> > see
> >> > how the proposal is any worse than the status quo with respect to the
> >> > problems you describe.
> >>
> >> It's not.
> >>
> >> My mail was more to express the position that switching out hashers is
> >> important for many applications.  It seems that perhaps if hashing is
> >> being reconsidered in a big way, the ability to switch out the hasher
> >> should be considered at the same time, lest an opportunity be lost.
> >> Even if switching out hashers is out of scope for P0029, it seems that
> >> if it's desirable and important it should be carefully considered, in
> >> the same way that heterogeneous lookups are being considered.
> >
> >
> > I see, that makes sense; I've made a note to discuss that in the next
> > version of the paper. I don't intend to actually propose it at this
> > juncture, but I agree it's worth considering
> >
> >>
> >>
> >> > Furthermore, you go on to express a desire for user
> >> > code to be able to control the default hash algorithm, but if anything
> >> > P0029
> >> > makes that easier, by giving you one central hash algorithm to
> control,
> >> > instead of dealing with hundreds of independent per-type hash
> >> > algorithms.
> >>
> >> I suppose this is true from one perspective.  In particular it's
> >> easier to use a single hasher with complex types -- assuming that
> >> everyone writes their boilerplate using the templated idiom.
> >
> >
> > Sorry, I don't follow.
> >
> >>
> >>
> >> On the other hand, the data structure using the hash -- unordered_map
> >> or whatever -- still controls which hasher we use.  My contention in
> >> the earlier e-mail is that because (a) people will largely use the
> >> default and (b) library writers are conservative, we will likely end
> >> up with possibly bad, probably slow standard-library hashers in the
> >> near future.  Just as replacing malloc is often a good idea, it seems
> >> to me that replacing the default hasher will also often be a good
> >> idea.  So I was asking that we consider whether it's desirable to
> >> design a system where it's easier to replace the default hasher.
> >
> >
> > I get all that. My point is that even if P0029 doesn't directly give you
> a
> > way to replace the hash algorithm, it gets you closer: under the status
> quo,
> > even if the standard library gave you a way to change the default hasher,
> > that would just give you a way to change the behavior for each type
> > individually. Under P0029, by contrast, if the standard library gives
> you a
> > way to change the hash algorithm, then you can change it for all types
> at a
> > stroke.
> >
> >>
> >>
> >> Maybe this is the best we can reasonably do in that respect.  It would
> >> sort of be a bummer (as I understand the proposal), but I get that
> >> there's a lot of constraints on the problem.
> >>
> >> > I take no position on
> >> > whether it would be a good idea to put such a mechanism in the
> standard
> >> > itself, but if you want to go that way, surely gaining implementation
> >> > experience in libc++ would be a good first step?
> >>
> >> Sure, and I feel like this thread is part of that process?
> >
> >
> > Agreed; I had thought you were objecting to my request, but it sounds
> like
> > that's not the case.
> >
> >>
> >>
> >> > On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <jlebar at google.com>
> wrote:
> >> >>
> >> >> Hi, I wanted to leave some thoughts here from my perspective as an
> >> >> application developer.  As context, I added heterogeneous lookup
> >> >> support to Google's internal dense_hash_{set,map}, and my day-to-day
> >> >> job involves working with lots of high-throughput hashtables.
> >> >>
> >> >> From my perspective, custom hash algorithms are pretty important, and
> >> >> I'm concerned that this standard may ossify the state of the world
> for
> >> >> applications which could benefit from using a different hash
> >> >> algorithm.
> >> >>
> >> >> Even if the standard library randomizes its hash (which I completely
> >> >> agree is a good idea), I contend that it's still going to be hard for
> >> >> standard library maintainers to swap out hash functions, because of
> >> >> the conservative nature of standard library writing.  (As evidence,
> >> >> consider that, last time I checked, glibc still used a malloc
> >> >> implementation whose throughput dropped by more than half as soon as
> >> >> you created a thread.)
> >> >>
> >> >> I expect that the result of this conservatism is that even if a good
> >> >> hash function is chosen today, it will rather quickly become
> outdated,
> >> >> as hashing is an area of active development in the community at large
> >> >> (see e.g. xxHash, MetroHash).  I expect this trend to continue --
> it's
> >> >> kind of the perfect nerdsnipe, and it also matters for lots of
> >> >> high-performance code.
> >> >>
> >> >> Moreover, applications don't necessarily control the version of the
> >> >> standard library they link with; even if one standard library
> >> >> implementation uses a good hash, that's no guarantee that they all
> >> >> will.  As soon as any one of the standard libraries I support uses a
> >> >> hash that's too slow or somehow weak (perhaps its randomization is
> >> >> weak and doesn't prevent flooding), I have to make nontrivial changes
> >> >> to all of my code.
> >> >>
> >> >> So I contend that "standard library writers should use a good, fast
> >> >> hash" implies "at least some standard libraries will use a possibly
> >> >> bad, probably slow hash" some time in the relatively near future,
> just
> >> >> as today some standard libraries use crappy malloc implementations.
> >> >> And so just as performance-sensitive applications commonly replace
> >> >> malloc, I expect many performance-sensitive applications will want to
> >> >> control their default hasher.
> >> >>
> >> >> Indeed, as the proposal says, "we must plan for hash algorithms that
> >> >> have a finite (and probably short) operational life," and my
> >> >> expectation is that there's a lot of carefully-maintained user code
> is
> >> >> going to better able to play this cat-and-mouse game than the
> compiler
> >> >> / standard library headers.
> >> >>
> >> >> One option is to say, fine, these applications can continue to use a
> >> >> custom hash functor just as they do today.  But as the proposal says,
> >> >> ergonomics and experience suggest that most people are just going to
> >> >> use the default.  Therefore my suggestion is, it would be really nice
> >> >> if there were an easy way for application developers to swap in a new
> >> >> hasher, even if it's only for a set of explicitly-listed types.
> >> >>
> >> >> I'll refrain from saying something dumb by not suggesting how this
> >> >> might be specified.  :)
> >> >>
> >> >> -Justin
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151208/54e4c145/attachment.html>