<div class="gmail_quote">On Fri, Feb 17, 2012 at 10:58 PM, Talin <span dir="ltr"><<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
However, I really do need an incremental hash for the various uniquing maps which I'm attempting to optimize. Take for example the case of uniquing a constant array - the key consists of a type* pointer and an array of constant*. Those data fields are not stored contiguously in memory, so I need to hash them separately and then combine the hashes. Being able to hash the data fields in place (as opposed to copying them to a contiguous buffer) turns out to be a fairly significant win for the uniquing maps - otherwise you end up having to do a malloc just to look up a key, and that's going to be slower than any incremental hash algorithm.</blockquote>
</div><br><div>I think you have a different idea of what 'incremental hash' means from what I do, and that's leading to the confusion, because we're talking about two very different things.</div><div><br></div>
<div>The term "incremental hash function" usually is a term of art referring to a hash function with the following property:</div><div><br></div><div>Given a series of bytes (or other units if you like) in 's', and using a python-like syntax:</div>
<div><br></div><div> H(s) = H(H(s[:-1]), s[-1])</div><div><br></div><div>This means that you can re-use a hash for a smaller chunk of data when computing the hash for a larger chunk. I don't think this is what you're looking for, and I know that most efficient, high-quality, non-cryptographic hash functions don't satisfy this property.</div>
<div><br></div><div><br></div><div>What you're talking about is just being able to hash a non-contiguous set of data. That is clearly important, but there are a lot of ways to achieve it.</div><div><br></div><div>Most of the functions I'm referring to are essentially block based. For example, CityHash is based on a 64-byte block hashing system. While this can necessitate copying the data, it should never require a malloc. You simply fill a block, and flush it out. The block can be on the stack, and modern hashing algorithms will find it much faster to memcpy mall (pointer-sized) bits of data into a single contiguous N-byte (64 in the case of city) block first, so I don't think this "costs" us anything, we actually want to copy the data in, and do the hashing algorithm once.</div>
<div><br></div><div>Note that the current CityHash algorithm (especially in its current implementation) isn't really setup for this, but we've talked to the author of it as well as Austin when we were designing the standard interface, and they seemed very positive on being able to adapt these high-quality hashing algorithms to the proposed interface.</div>