[llvm-commits] PATCH: Refactored hashing for constant unique map

Sun Feb 5 17:03:23 PST 2012

On Sun, Feb 5, 2012 at 4:53 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Sun, Feb 5, 2012 at 4:43 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:
>
>>
>> On Feb 5, 2012, at 4:31 PM, Talin wrote:
>>
>> > Patch to refactor the hashing functions for the constant unique map, to
>> avoid code duplication.
>>
>> Wouldn't it give nicer code if you kept the intermediate state as a class
>> member?
>>
>> Something like:
>>
>> IncrementalHash Hash
>> Hash.add(foo);
>> Hahs.add(bar);
>> return Hash.finish();
>
>
> If you're interested in building nice, generalized hashing infrastructure
> in Support, I think we can and should have a much better interface than
> this or the one Talin is proposing.
>
> I would look at the proposed standard interfaces for hashing that Jeffrey
> and I are proposing for the next C++ standard. We could almost certainly
> implement something very similar to it, although there are a few subtleties
> we would need to work out surrounding name lookup.
>
> Also, I don't really want such a weak hash function in Support. The
> standard libraries actually have much higher quality hash functions now.
>
> Finally, we already have some hashing  routines in the DenseSet
> implementation. I feel this refactoring is only the tip of the ice berg,
> and isn't really factored properly for use in the rest of the cases.
>
> However, as Nick hinted, I think the best solution might be to change the
> code Talin originally touched to use FoldingSetNodeID, which was carefully
> designed for such situations it seems.... A generalized hash function often
> isn't designed in that way.
>

Yeah, I'm aware of the hashing functions in DenseMapInfo and
FoldingSetNodeID...I just wasn't sure I wanted to be that ambitious. Given
that I'm not an expert on hashing, I was trying to avoid any behavior
changes to existing code. I didn't want FoldingSet to start getting
collisions because of some mistake I had made.

There's also Austin Appleby's MurmurHash3 (
http://code.google.com/p/smhasher/), which is also a much better hashing
function. If you look at the source (
http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp) you
can see where he's a number of tricky optimizations for various compilers.
And since the code is public domain, there would be no issue including it
in LLVM verbatim.

The one downside is that his implementation doesn't do incremental hashes
very well - it operates most efficiently when the data is in a contiguous
buffer. That works fine for the FoldingSetNodeID use case, but isn't
suitable for the constant uniquing case because you can't get a contiguous
array of the operands of a constant. (Internally the constants are stored
as an array of Use*, with additional data stored in the low two bits, which
we don't want to be included in the hash calculation.)

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120205/62be4bcc/attachment.html>