<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 7, 2015 at 6:50 PM, Justin Lebar <span dir="ltr"><<a href="mailto:jlebar@google.com" target="_blank">jlebar@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> I've made a note to discuss that in the next version of the paper. I don't intend to actually propose it at this juncture, but I agree it's worth considering<br>

<br>

</span>Fantastic, thank you!<br>

<span class=""><br>

> Under P0029, by contrast, if the standard library gives you a way to change the hash algorithm, then you can change it for all types at a stroke.<br>

<br>

</span>Strong agree.<br>

<span class=""><br>

> I had thought you were objecting to my request, but it sounds like that's not the case.<br>

<br>

</span>Sorry, there clearly are communication nuances on this list that I'm<br>

not yet hep to.  Thanks for being patient and digging out what I was<br>

trying to say.<br></blockquote><div><br></div><div>I'm new here too, so this may just be down to my lack of reading comprehension skils.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class="HOEnZb"><font color="#888888"><br>

-Justin<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Mon, Dec 7, 2015 at 3:27 PM, Geoffrey Romer <<a href="mailto:gromer@google.com">gromer@google.com</a>> wrote:<br>

><br>

> On Mon, Dec 7, 2015 at 2:31 PM, Justin Lebar <<a href="mailto:jlebar@google.com">jlebar@google.com</a>> wrote:<br>

>><br>

>> On Mon, Dec 7, 2015 at 10:36 AM, Geoffrey Romer <<a href="mailto:gromer@google.com">gromer@google.com</a>> wrote:<br>

>> > I'm confused. You start by saying that you're concerned that "this<br>

>> > standard<br>

>> > [meaning P0029, I guess?] may ossify the state of the world for<br>

>> > applications<br>

>> > which could benefit from using a different hash algorithm", but I don't<br>

>> > see<br>

>> > how the proposal is any worse than the status quo with respect to the<br>

>> > problems you describe.<br>

>><br>

>> It's not.<br>

>><br>

>> My mail was more to express the position that switching out hashers is<br>

>> important for many applications.  It seems that perhaps if hashing is<br>

>> being reconsidered in a big way, the ability to switch out the hasher<br>

>> should be considered at the same time, lest an opportunity be lost.<br>

>> Even if switching out hashers is out of scope for P0029, it seems that<br>

>> if it's desirable and important it should be carefully considered, in<br>

>> the same way that heterogeneous lookups are being considered.<br>

><br>

><br>

> I see, that makes sense; I've made a note to discuss that in the next<br>

> version of the paper. I don't intend to actually propose it at this<br>

> juncture, but I agree it's worth considering<br>

><br>

>><br>

>><br>

>> > Furthermore, you go on to express a desire for user<br>

>> > code to be able to control the default hash algorithm, but if anything<br>

>> > P0029<br>

>> > makes that easier, by giving you one central hash algorithm to control,<br>

>> > instead of dealing with hundreds of independent per-type hash<br>

>> > algorithms.<br>

>><br>

>> I suppose this is true from one perspective.  In particular it's<br>

>> easier to use a single hasher with complex types -- assuming that<br>

>> everyone writes their boilerplate using the templated idiom.<br>

><br>

><br>

> Sorry, I don't follow.<br>

><br>

>><br>

>><br>

>> On the other hand, the data structure using the hash -- unordered_map<br>

>> or whatever -- still controls which hasher we use.  My contention in<br>

>> the earlier e-mail is that because (a) people will largely use the<br>

>> default and (b) library writers are conservative, we will likely end<br>

>> up with possibly bad, probably slow standard-library hashers in the<br>

>> near future.  Just as replacing malloc is often a good idea, it seems<br>

>> to me that replacing the default hasher will also often be a good<br>

>> idea.  So I was asking that we consider whether it's desirable to<br>

>> design a system where it's easier to replace the default hasher.<br>

><br>

><br>

> I get all that. My point is that even if P0029 doesn't directly give you a<br>

> way to replace the hash algorithm, it gets you closer: under the status quo,<br>

> even if the standard library gave you a way to change the default hasher,<br>

> that would just give you a way to change the behavior for each type<br>

> individually. Under P0029, by contrast, if the standard library gives you a<br>

> way to change the hash algorithm, then you can change it for all types at a<br>

> stroke.<br>

><br>

>><br>

>><br>

>> Maybe this is the best we can reasonably do in that respect.  It would<br>

>> sort of be a bummer (as I understand the proposal), but I get that<br>

>> there's a lot of constraints on the problem.<br>

>><br>

>> > I take no position on<br>

>> > whether it would be a good idea to put such a mechanism in the standard<br>

>> > itself, but if you want to go that way, surely gaining implementation<br>

>> > experience in libc++ would be a good first step?<br>

>><br>

>> Sure, and I feel like this thread is part of that process?<br>

><br>

><br>

> Agreed; I had thought you were objecting to my request, but it sounds like<br>

> that's not the case.<br>

><br>

>><br>

>><br>

>> > On Sun, Dec 6, 2015 at 10:05 PM, Justin Lebar <<a href="mailto:jlebar@google.com">jlebar@google.com</a>> wrote:<br>

>> >><br>

>> >> Hi, I wanted to leave some thoughts here from my perspective as an<br>

>> >> application developer.  As context, I added heterogeneous lookup<br>

>> >> support to Google's internal dense_hash_{set,map}, and my day-to-day<br>

>> >> job involves working with lots of high-throughput hashtables.<br>

>> >><br>

>> >> From my perspective, custom hash algorithms are pretty important, and<br>

>> >> I'm concerned that this standard may ossify the state of the world for<br>

>> >> applications which could benefit from using a different hash<br>

>> >> algorithm.<br>

>> >><br>

>> >> Even if the standard library randomizes its hash (which I completely<br>

>> >> agree is a good idea), I contend that it's still going to be hard for<br>

>> >> standard library maintainers to swap out hash functions, because of<br>

>> >> the conservative nature of standard library writing.  (As evidence,<br>

>> >> consider that, last time I checked, glibc still used a malloc<br>

>> >> implementation whose throughput dropped by more than half as soon as<br>

>> >> you created a thread.)<br>

>> >><br>

>> >> I expect that the result of this conservatism is that even if a good<br>

>> >> hash function is chosen today, it will rather quickly become outdated,<br>

>> >> as hashing is an area of active development in the community at large<br>

>> >> (see e.g. xxHash, MetroHash).  I expect this trend to continue -- it's<br>

>> >> kind of the perfect nerdsnipe, and it also matters for lots of<br>

>> >> high-performance code.<br>

>> >><br>

>> >> Moreover, applications don't necessarily control the version of the<br>

>> >> standard library they link with; even if one standard library<br>

>> >> implementation uses a good hash, that's no guarantee that they all<br>

>> >> will.  As soon as any one of the standard libraries I support uses a<br>

>> >> hash that's too slow or somehow weak (perhaps its randomization is<br>

>> >> weak and doesn't prevent flooding), I have to make nontrivial changes<br>

>> >> to all of my code.<br>

>> >><br>

>> >> So I contend that "standard library writers should use a good, fast<br>

>> >> hash" implies "at least some standard libraries will use a possibly<br>

>> >> bad, probably slow hash" some time in the relatively near future, just<br>

>> >> as today some standard libraries use crappy malloc implementations.<br>

>> >> And so just as performance-sensitive applications commonly replace<br>

>> >> malloc, I expect many performance-sensitive applications will want to<br>

>> >> control their default hasher.<br>

>> >><br>

>> >> Indeed, as the proposal says, "we must plan for hash algorithms that<br>

>> >> have a finite (and probably short) operational life," and my<br>

>> >> expectation is that there's a lot of carefully-maintained user code is<br>

>> >> going to better able to play this cat-and-mouse game than the compiler<br>

>> >> / standard library headers.<br>

>> >><br>

>> >> One option is to say, fine, these applications can continue to use a<br>

>> >> custom hash functor just as they do today.  But as the proposal says,<br>

>> >> ergonomics and experience suggest that most people are just going to<br>

>> >> use the default.  Therefore my suggestion is, it would be really nice<br>

>> >> if there were an easy way for application developers to swap in a new<br>

>> >> hasher, even if it's only for a set of explicitly-listed types.<br>

>> >><br>

>> >> I'll refrain from saying something dumb by not suggesting how this<br>

>> >> might be specified.  :)<br>

>> >><br>

>> >> -Justin<br>

>> ><br>

>> ><br>

><br>

><br>

</div></div></blockquote></div><br></div></div>