[cfe-dev] [libcxx] Extending the behavior of std::hash in libc++

Mon Dec 7 18:50:16 PST 2015

> I've made a note to discuss that in the next version of the paper. I don't intend to actually propose it at this juncture, but I agree it's worth considering

Fantastic, thank you!

> Under P0029, by contrast, if the standard library gives you a way to change the hash algorithm, then you can change it for all types at a stroke.

Strong agree.

> I had thought you were objecting to my request, but it sounds like that's not the case.

Sorry, there clearly are communication nuances on this list that I'm
not yet hep to.  Thanks for being patient and digging out what I was
trying to say.

-Justin

On Mon, Dec 7, 2015 at 3:27 PM, Geoffrey Romer <gromer at google.com> wrote:
>
> On Mon, Dec 7, 2015 at 2:31 PM, Justin Lebar <jlebar at google.com> wrote:
>>
>> On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <gromer at google.com> wrote:
>> > I'm confused. You start by saying that you're concerned that "this
>> > standard
>> > [meaning P0029, I guess?] may ossify the state of the world for
>> > applications
>> > which could benefit from using a different hash algorithm", but I don't
>> > see
>> > how the proposal is any worse than the status quo with respect to the
>> > problems you describe.
>>
>> It's not.
>>
>> My mail was more to express the position that switching out hashers is
>> important for many applications.  It seems that perhaps if hashing is
>> being reconsidered in a big way, the ability to switch out the hasher
>> should be considered at the same time, lest an opportunity be lost.
>> Even if switching out hashers is out of scope for P0029, it seems that
>> if it's desirable and important it should be carefully considered, in
>> the same way that heterogeneous lookups are being considered.
>
>
> I see, that makes sense; I've made a note to discuss that in the next
> version of the paper. I don't intend to actually propose it at this
> juncture, but I agree it's worth considering
>
>>
>>
>> > Furthermore, you go on to express a desire for user
>> > code to be able to control the default hash algorithm, but if anything
>> > P0029
>> > makes that easier, by giving you one central hash algorithm to control,
>> > instead of dealing with hundreds of independent per-type hash
>> > algorithms.
>>
>> I suppose this is true from one perspective.  In particular it's
>> easier to use a single hasher with complex types -- assuming that
>> everyone writes their boilerplate using the templated idiom.
>
>
> Sorry, I don't follow.
>
>>
>>
>> On the other hand, the data structure using the hash -- unordered_map
>> or whatever -- still controls which hasher we use.  My contention in
>> the earlier e-mail is that because (a) people will largely use the
>> default and (b) library writers are conservative, we will likely end
>> up with possibly bad, probably slow standard-library hashers in the
>> near future.  Just as replacing malloc is often a good idea, it seems
>> to me that replacing the default hasher will also often be a good
>> idea.  So I was asking that we consider whether it's desirable to
>> design a system where it's easier to replace the default hasher.
>
>
> I get all that. My point is that even if P0029 doesn't directly give you a
> way to replace the hash algorithm, it gets you closer: under the status quo,
> even if the standard library gave you a way to change the default hasher,
> that would just give you a way to change the behavior for each type
> individually. Under P0029, by contrast, if the standard library gives you a
> way to change the hash algorithm, then you can change it for all types at a
> stroke.
>
>>
>>
>> Maybe this is the best we can reasonably do in that respect.  It would
>> sort of be a bummer (as I understand the proposal), but I get that
>> there's a lot of constraints on the problem.
>>
>> > I take no position on
>> > whether it would be a good idea to put such a mechanism in the standard
>> > itself, but if you want to go that way, surely gaining implementation
>> > experience in libc++ would be a good first step?
>>
>> Sure, and I feel like this thread is part of that process?
>
>
> Agreed; I had thought you were objecting to my request, but it sounds like
> that's not the case.
>
>>
>>
>> > On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <jlebar at google.com> wrote:
>> >>
>> >> Hi, I wanted to leave some thoughts here from my perspective as an
>> >> application developer.  As context, I added heterogeneous lookup
>> >> support to Google's internal dense_hash_{set,map}, and my day-to-day
>> >> job involves working with lots of high-throughput hashtables.
>> >>
>> >> From my perspective, custom hash algorithms are pretty important, and
>> >> I'm concerned that this standard may ossify the state of the world for
>> >> applications which could benefit from using a different hash
>> >> algorithm.
>> >>
>> >> Even if the standard library randomizes its hash (which I completely
>> >> agree is a good idea), I contend that it's still going to be hard for
>> >> standard library maintainers to swap out hash functions, because of
>> >> the conservative nature of standard library writing.  (As evidence,
>> >> consider that, last time I checked, glibc still used a malloc
>> >> implementation whose throughput dropped by more than half as soon as
>> >> you created a thread.)
>> >>
>> >> I expect that the result of this conservatism is that even if a good
>> >> hash function is chosen today, it will rather quickly become outdated,
>> >> as hashing is an area of active development in the community at large
>> >> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's
>> >> kind of the perfect nerdsnipe, and it also matters for lots of
>> >> high-performance code.
>> >>
>> >> Moreover, applications don't necessarily control the version of the
>> >> standard library they link with; even if one standard library
>> >> implementation uses a good hash, that's no guarantee that they all
>> >> will.  As soon as any one of the standard libraries I support uses a
>> >> hash that's too slow or somehow weak (perhaps its randomization is
>> >> weak and doesn't prevent flooding), I have to make nontrivial changes
>> >> to all of my code.
>> >>
>> >> So I contend that "standard library writers should use a good, fast
>> >> hash" implies "at least some standard libraries will use a possibly
>> >> bad, probably slow hash" some time in the relatively near future, just
>> >> as today some standard libraries use crappy malloc implementations.
>> >> And so just as performance-sensitive applications commonly replace
>> >> malloc, I expect many performance-sensitive applications will want to
>> >> control their default hasher.
>> >>
>> >> Indeed, as the proposal says, "we must plan for hash algorithms that
>> >> have a finite (and probably short) operational life," and my
>> >> expectation is that there's a lot of carefully-maintained user code is
>> >> going to better able to play this cat-and-mouse game than the compiler
>> >> / standard library headers.
>> >>
>> >> One option is to say, fine, these applications can continue to use a
>> >> custom hash functor just as they do today.  But as the proposal says,
>> >> ergonomics and experience suggest that most people are just going to
>> >> use the default.  Therefore my suggestion is, it would be really nice
>> >> if there were an easy way for application developers to swap in a new
>> >> hasher, even if it's only for a set of explicitly-listed types.
>> >>
>> >> I'll refrain from saying something dumb by not suggesting how this
>> >> might be specified.  :)
>> >>
>> >> -Justin
>> >
>> >
>
>