[cfe-commits] [libcxx] r159836 - in /libcxx/trunk: include/__hash_table test/containers/unord/unord.map/reserve.pass.cpp test/containers/unord/unord.multimap/reserve.pass.cpp test/containers/unord/unord.multiset/reserve.pass.cpp test/containers/unord/unord.set/reserve.pass.cpp

Joerg Sonnenberger joerg at britannica.bec.de
Fri Jul 6 20:55:34 PDT 2012

On Fri, Jul 06, 2012 at 10:58:37PM -0400, Howard Hinnant wrote:
> > So with this
> > change, power-of-two sized hash tables still have a conditional in the
> > hot path.
> Yes.  My observation is that the cost of this conditional is about 4
> machine cycles on Intel.  In real world use the branch takes the same
> path over and over, so the hardware branch predictor becomes a very
> effective optimization.  I.e. it is rare for a hash table to switch
> from prime to power-of-2 number of buckets or vice-versa, compared to
> the number of operations that constrain the hash to the number of buckets.

4 machine cycles sounds pretty awful actually given that the latency for
div-by-constant is in the order of 20 cycles or less on modern Intel.

> I wouldn't go that far at this time.  There are at least two dimensions to the quality of a hash function:
> 1. The extent to which it spreads entropy over all of the bits.
> 2. Speed.
> Prior to this commit I observed test cases that took advantage of both
> of these dimensions.  And maximizing both of them at once is not
> a solved problem with current technology.

A hash function that doesn't do (1) (e.g. avoid collisions on the full
value range) is just going to be crap. Nothing the container can do will
help with that. My point remains that a hash function that works well
with prime sizes will pretty much always work as well with power-of-two sizes
when using a pre-multiplication. There might be an example that doesn't
work well with a specific constant, but in general it will work just as
well. Latency for such a pre-multiplication would be 7 cycles for Core
architecture and AMD is similar. That would make it about as expensive
and be still less sensitive to weak hash functions.

> > The complexity of computing primes basically just goes back
> > to the argument that linear series with increments smaller than the
> > prime itself are evenly distributed.
> The need to compute a prime is amortized over the life of an unordered
> container much like the need to allocate a new capacity is amortized
> over the life of a vector.

I'm not saying that it is expensive. I am saying that the *reason* for
doing it in first place is questionable complexity.


More information about the cfe-commits mailing list