[cfe-commits] [libcxx] r159836 - in /libcxx/trunk: include/__hash_table test/containers/unord/unord.map/reserve.pass.cpp test/containers/unord/unord.multimap/reserve.pass.cpp test/containers/unord/unord.multiset/reserve.pass.cpp test/containers/unord/unord.set/reserve.pass.cpp
Howard Hinnant
hhinnant at apple.com
Fri Jul 6 19:58:37 PDT 2012
On Jul 6, 2012, at 9:58 PM, Joerg Sonnenberger wrote:
> On Fri, Jul 06, 2012 at 05:31:14PM -0000, Howard Hinnant wrote:
>> Author: hhinnant
>> Date: Fri Jul 6 12:31:14 2012
>> New Revision: 159836
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=159836&view=rev
>> Log:
>> This commit establishes a new bucket_count policy in the unordered
>> containers: The policy now allows a power-of-2 number of buckets to
>> be requested (and that request honored) by the client.
>> And if the number of buckets is set to a power of 2, then the constraint
>> of the hash to the number of buckets uses & instead of %.
>
> Does the standard place any constraints in this area?
No.
> So with this
> change, power-of-two sized hash tables still have a conditional in the
> hot path.
Yes. My observation is that the cost of this conditional is about 4 machine cycles on Intel. In real world use the branch takes the same path over and over, so the hardware branch predictor becomes a very effective optimization. I.e. it is rare for a hash table to switch from prime to power-of-2 number of buckets or vice-versa, compared to the number of operations that constrain the hash to the number of buckets.
> The only justification for using non-power-of-two hashes seems
> to be extremely poor user-provided hash functions, which sounds like a
> lame excuse.
I wouldn't go that far at this time. There are at least two dimensions to the quality of a hash function:
1. The extent to which it spreads entropy over all of the bits.
2. Speed.
Prior to this commit I observed test cases that took advantage of both of these dimensions. And maximizing both of them at once is not a solved problem with current technology.
> The complexity of computing primes basically just goes back
> to the argument that linear series with increments smaller than the
> prime itself are evenly distributed.
The need to compute a prime is amortized over the life of an unordered container much like the need to allocate a new capacity is amortized over the life of a vector. Not only is it not a driving factor, but significant optimization has gone into this effort such that the ratio of cost-to-comupte-prime/number-of-buckets actually dramatically drops as the number of buckets grows. I.e. Yes, you might spend more to compute a prime to hold enough buckets for a million buckets than for a hundred. But the cost per bucket will be far less for the million bucket solution than for the hundred bucket solution. This is a solved problem, and not a research topic (unless the client intervenes and demands a linear bucket growth instead of a geometric one - disastrous for both vectors and unordered containers).
> The same or similar properties can
> be obtained using double-width-multiplication-and-mask universal hash
> functions (hash(x) = hi(dw(x) * T), where dw casts x to a type twice the
> size and hi gives the upper half). They are no more expensive to compute
> than the double multiplication version of the modulo and significant
> cheaper than using modulo directly on most (all?) platforms. So: why not
> kill the complexity and require power-of-two bucket counts and clients
> to use a sane hash function?
My current experience is that a truly general purpose hash container can not assume much about the collection of keys, nor the function with which those keys are hashed. This is especially true when I don't even know the type of the keys, nor the algorithm behind the hash function. And yet casual users will demand decent (non-quadradic) performance.
Admittedly, my clients continually evolve, becoming more educated and more clever with every year. There may come a day when I can be assured that my clients will just assume a power-of-2 hash container and will have the tools to deal with that. But that is not what I see today.
What I see today is that it is common for my clients to produce a combination of hash function and container values that will result in significant collisions if the number of buckets is not flexible, even after there is explicit evidence that they have attempted to avoid such a situation. And then I get performance bug reports against the libc++ unordered containers. This is not to say that my clients don't know what they're doing. It is to say that there is a wide variety of unordered container implementations available today, and it takes quite sophisticated analysis to guarantee consistent performance over all of them.
The current libc++ philosophy is to be forgiving in this department, but to also expose the sharper tools if you want to try to use them. That is, I'm trying to increase the viable options for the client, not decrease them. The very strongest negative performance vectors are associated with the client attempting to set the number of buckets to one value and the libc++ implementation insisting on another value. And that is what is being improved upon here.
I.e. hash containers need a manual transmission. An automatic is fine for casual users. But we've got to have a way to override the automatic. 4-speed is barely adequate. 5 or 6 speed is much better. 8 or 10 speed would be awesome!
Howard
More information about the cfe-commits
mailing list