[PATCH] D40683: Make .gnu.hash section smaller.

Thu Nov 30 15:48:51 PST 2017

Can we add a test showing the number of buckets that is selected?

Rui Ueyama via Phabricator <reviews at reviews.llvm.org> writes:

> -// Returns a number of hash buckets to accomodate given number of elements.
> -// We want to choose a moderate number that is not too small (which
> -// causes too many hash collisions) and not too large (which wastes
> -// disk space.)
> -//
> -// We return a prime number because it (is believed to) achieve good
> -// hash distribution.
> -static size_t getBucketSize(size_t NumSymbols) {
> -  // List of largest prime numbers that are not greater than 2^n + 1.
> -  for (size_t N : {131071, 65521, 32749, 16381, 8191, 4093, 2039, 1021, 509,
> -                   251, 127, 61, 31, 13, 7, 3, 1})
> -    if (N <= NumSymbols)
> -      return N;
> -  return 0;
> -}

It turns out that using a prime number was not worth it?

>  // Add symbols to this symbol hash table. Note that this function
>  // destructively sort a given vector -- which is needed because
>  // GNU-style hash table places some sorting requirements.
> @@ -1813,7 +1797,12 @@
>      Symbols.push_back({B, Ent.StrTabOffset, hashGnu(B->getName())});
>    }
>  
> -  NBuckets = getBucketSize(Symbols.size());
> +  // We chose load factor 4 for the on-disk hash table. For each hash
> +  // collision, the dynamic linker will compare a uint32_t hash value.
> +  // Since the integer comparison is quite fast, I believe we can make
> +  // the load factor even larger. 4 is just a conservative choice.
> +  NBuckets = std::max<size_t>(Symbols.size() / 4, 1);

Make the comment impersonal.

LGTM with that.

We can make the hashes bigger again (or use prime numbers) if we get a
benchmark showing a problem. For now I think we should take the space
saving.

Cheers,
Rafael