<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 7, 2015 at 2:31 PM, Justin Lebar <span dir="ltr"><<a href="mailto:jlebar@google.com" target="_blank">jlebar@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <<a href="mailto:gromer@google.com" target="_blank">gromer@google.com</a>> wrote:<br>

> I'm confused. You start by saying that you're concerned that "this standard<br>

> [meaning P0029, I guess?] may ossify the state of the world for applications<br>

> which could benefit from using a different hash algorithm", but I don't see<br>

> how the proposal is any worse than the status quo with respect to the<br>

> problems you describe.<br>

<br>

</span>It's not.<br>

<br>

My mail was more to express the position that switching out hashers is<br>

important for many applications.  It seems that perhaps if hashing is<br>

being reconsidered in a big way, the ability to switch out the hasher<br>

should be considered at the same time, lest an opportunity be lost.<br>

Even if switching out hashers is out of scope for P0029, it seems that<br>

if it's desirable and important it should be carefully considered, in<br>

the same way that heterogeneous lookups are being considered.<br></blockquote><div><br></div><div>I see, that makes sense; I've made a note to discuss that in the next version of the paper. I don't intend to actually propose it at this juncture, but I agree it's worth considering</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span><br>

> Furthermore, you go on to express a desire for user<br>

> code to be able to control the default hash algorithm, but if anything P0029<br>

> makes that easier, by giving you one central hash algorithm to control,<br>

> instead of dealing with hundreds of independent per-type hash algorithms.<br>

<br>

</span>I suppose this is true from one perspective.  In particular it's<br>

easier to use a single hasher with complex types -- assuming that<br>

everyone writes their boilerplate using the templated idiom.<br></blockquote><div><br></div><div>Sorry, I don't follow.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

On the other hand, the data structure using the hash -- unordered_map<br>

or whatever -- still controls which hasher we use.  My contention in<br>

the earlier e-mail is that because (a) people will largely use the<br>

default and (b) library writers are conservative, we will likely end<br>

up with possibly bad, probably slow standard-library hashers in the<br>

near future.  Just as replacing malloc is often a good idea, it seems<br>

to me that replacing the default hasher will also often be a good<br>

idea.  So I was asking that we consider whether it's desirable to<br>

design a system where it's easier to replace the default hasher.<br></blockquote><div><br></div><div>I get all that. My point is that even if P0029 doesn't directly give you a way to replace the hash algorithm, it gets you closer: under the status quo, even if the standard library gave you a way to change the default hasher, that would just give you a way to change the behavior for each type individually. Under P0029, by contrast, if the standard library gives you a way to change the hash algorithm, then you can change it for all types at a stroke.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Maybe this is the best we can reasonably do in that respect.  It would<br>

sort of be a bummer (as I understand the proposal), but I get that<br>

there's a lot of constraints on the problem.<br>

<span><br>

> I take no position on<br>

> whether it would be a good idea to put such a mechanism in the standard<br>

> itself, but if you want to go that way, surely gaining implementation<br>

> experience in libc++ would be a good first step?<br>

<br>

</span>Sure, and I feel like this thread is part of that process?<br></blockquote><div><br></div><div>Agreed; I had thought you were objecting to my request, but it sounds like that's not the case.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><div><br>

> On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <<a href="mailto:jlebar@google.com" target="_blank">jlebar@google.com</a>> wrote:<br>

>><br>

>> Hi, I wanted to leave some thoughts here from my perspective as an<br>

>> application developer.  As context, I added heterogeneous lookup<br>

>> support to Google's internal dense_hash_{set,map}, and my day-to-day<br>

>> job involves working with lots of high-throughput hashtables.<br>

>><br>

>> From my perspective, custom hash algorithms are pretty important, and<br>

>> I'm concerned that this standard may ossify the state of the world for<br>

>> applications which could benefit from using a different hash<br>

>> algorithm.<br>

>><br>

>> Even if the standard library randomizes its hash (which I completely<br>

>> agree is a good idea), I contend that it's still going to be hard for<br>

>> standard library maintainers to swap out hash functions, because of<br>

>> the conservative nature of standard library writing.  (As evidence,<br>

>> consider that, last time I checked, glibc still used a malloc<br>

>> implementation whose throughput dropped by more than half as soon as<br>

>> you created a thread.)<br>

>><br>

>> I expect that the result of this conservatism is that even if a good<br>

>> hash function is chosen today, it will rather quickly become outdated,<br>

>> as hashing is an area of active development in the community at large<br>

>> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's<br>

>> kind of the perfect nerdsnipe, and it also matters for lots of<br>

>> high-performance code.<br>

>><br>

>> Moreover, applications don't necessarily control the version of the<br>

>> standard library they link with; even if one standard library<br>

>> implementation uses a good hash, that's no guarantee that they all<br>

>> will.  As soon as any one of the standard libraries I support uses a<br>

>> hash that's too slow or somehow weak (perhaps its randomization is<br>

>> weak and doesn't prevent flooding), I have to make nontrivial changes<br>

>> to all of my code.<br>

>><br>

>> So I contend that "standard library writers should use a good, fast<br>

>> hash" implies "at least some standard libraries will use a possibly<br>

>> bad, probably slow hash" some time in the relatively near future, just<br>

>> as today some standard libraries use crappy malloc implementations.<br>

>> And so just as performance-sensitive applications commonly replace<br>

>> malloc, I expect many performance-sensitive applications will want to<br>

>> control their default hasher.<br>

>><br>

>> Indeed, as the proposal says, "we must plan for hash algorithms that<br>

>> have a finite (and probably short) operational life," and my<br>

>> expectation is that there's a lot of carefully-maintained user code is<br>

>> going to better able to play this cat-and-mouse game than the compiler<br>

>> / standard library headers.<br>

>><br>

>> One option is to say, fine, these applications can continue to use a<br>

>> custom hash functor just as they do today.  But as the proposal says,<br>

>> ergonomics and experience suggest that most people are just going to<br>

>> use the default.  Therefore my suggestion is, it would be really nice<br>

>> if there were an easy way for application developers to swap in a new<br>

>> hasher, even if it's only for a set of explicitly-listed types.<br>

>><br>

>> I'll refrain from saying something dumb by not suggesting how this<br>

>> might be specified.  :)<br>

>><br>

>> -Justin<br>

><br>

><br>

</div></div></blockquote></div><br></div></div>