[cfe-dev] RFC: change string hash function in libc++

Howard Hinnant hhinnant at apple.com
Sat Dec 3 17:17:22 PST 2011


I'm attempting to quantify the goodness of a hash function installed into a std::unordered_set/map.  I.e. ultimately the number of collisions is determined not only by the hash function, but by the implementation of the unordered container as well.

I've put together the following test function which scores a container at any given time during its lifetime based on the number of collisions it has.  A score of 0.f indicates that all values are hashed into the same bucket.  A score of 100.f indicates that there are no collisions.

//  A quantitative grade [0, 100] of how many collisions there
//    are in an unordred_set/map.
//    0 == all elements in the same bucket
//  100 == no collisions
//  A lower max_load_factor pushes this score towards 100
//  A higher max_load_factor pushes this score towards 0
template <class C>
float
grade(const C& c)
{
    using namespace std;
    if (c.size() <= 1)
        return 100;
    float score = 0;
    size_t bc = c.bucket_count();
    for (size_t i = 0; i != bc; ++i)
    {
        size_t bs = c.bucket_size(i);
        if (bs > 1)
            score += bs - 1;
    }
    return 100*(1 - score / (c.size() - 1));
}

I'm not at this time proposing putting this code into libc++ nor even in the test suite.  I'm simply requesting comments on if this is a reasonable tool for quantifying the goodness of an unordered container's state.  I've been working with it for only a day now, but I'm happy with it.  But I'm not infallible, so before I make decisions based on this metric, I'm opening up the opportunity to trash the metric.

Howard




More information about the cfe-dev mailing list