[llvm-commits] [llvm] r152116 - /llvm/trunk/lib/VMCore/ConstantsContext.h

Jakob Stoklund Olesen stoklund at 2pi.dk
Wed Mar 7 20:14:14 PST 2012


On Mar 7, 2012, at 12:29 AM, Duncan Sands wrote:

>> I think you underestimate how slow it is to hash things using incremental calls
>> to hash_combine. The reason for the other interface is that it is *much* faster.
> 
> why is this?  Is it just that the optimizers are doing a poor job, or is there a
> more fundamental reason that means that multiple hash_combine calls can't be as
> efficient as hash_combine_range?

Good hash functions have an internal state that is much larger than the final hash. Chandler's variant of CityHash has 6 x 64-bit state registers in hash_state. The large internal state is essential for cryptographic hash functions to prevent collisions, and hashes that simply try to be fast can be less careful about losing entropy when adding new data.

The normal operation of a hash function is to accumulate entropy from all input data in the internal state. When all entropy has been collected, the internal state is projected onto the smaller output space, a size_t / hash_code in this case. The final projection is less performance sensitive than the accumulation since it only runs once per hash.

The hash_combine interface forces hashes to be computed as an expression tree. Every call to hash_combine projects onto the smaller output space prematurely, and the projection function is executed for each node of the expression tree instead of just once. This increases the likelihood of collisions, and it makes the hash more expensive to compute than it needs to be.

Chandler is trying to minimize the number of nodes in the expression tree by avoiding hash_combine() calls. I don't think that allocating memory in a hash function is a good alternative, though.

I would propose an interface that doesn't constantly collapse the hash state when hashing composite objects. The hash_state class can be given an interface very similar to raw_ostream:

hash_state &operator<<(hash_state &HS, const ConstantClass *CP) {
  HS << CP->getType();
  for (unsigned I = 0, E = CP->getNumOperands(); I < E; ++I)
    HS << CP->getOperand(I);
  return HS;
}

The hash_state object is created by the final consumer of the hash value, and passed around by reference to all the parts of a composite object.

/jakob





More information about the llvm-commits mailing list