[PATCH] D40683: Make .gnu.hash section smaller.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 30 15:53:22 PST 2017


On Thu, Nov 30, 2017 at 3:48 PM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

> Can we add a test showing the number of buckets that is selected?
>
> Rui Ueyama via Phabricator <reviews at reviews.llvm.org> writes:
>
> > -// Returns a number of hash buckets to accomodate given number of
> elements.
> > -// We want to choose a moderate number that is not too small (which
> > -// causes too many hash collisions) and not too large (which wastes
> > -// disk space.)
> > -//
> > -// We return a prime number because it (is believed to) achieve good
> > -// hash distribution.
> > -static size_t getBucketSize(size_t NumSymbols) {
> > -  // List of largest prime numbers that are not greater than 2^n + 1.
> > -  for (size_t N : {131071, 65521, 32749, 16381, 8191, 4093, 2039, 1021,
> 509,
> > -                   251, 127, 61, 31, 13, 7, 3, 1})
> > -    if (N <= NumSymbols)
> > -      return N;
> > -  return 0;
> > -}
>
> It turns out that using a prime number was not worth it?
>

Looks like so. It was cargo-culted from the GNU linkers. I don't think that
the hash function for the .gnu.hash is so bad that it show some pattern
when divided by a non-prime number. You can confirm that in the hash
distribution histogram for this particular example.

>  // Add symbols to this symbol hash table. Note that this function
> >  // destructively sort a given vector -- which is needed because
> >  // GNU-style hash table places some sorting requirements.
> > @@ -1813,7 +1797,12 @@
> >      Symbols.push_back({B, Ent.StrTabOffset, hashGnu(B->getName())});
> >    }
> >
> > -  NBuckets = getBucketSize(Symbols.size());
> > +  // We chose load factor 4 for the on-disk hash table. For each hash
> > +  // collision, the dynamic linker will compare a uint32_t hash value.
> > +  // Since the integer comparison is quite fast, I believe we can make
> > +  // the load factor even larger. 4 is just a conservative choice.
> > +  NBuckets = std::max<size_t>(Symbols.size() / 4, 1);
>
> Make the comment impersonal.
>
> LGTM with that.
>
> We can make the hashes bigger again (or use prime numbers) if we get a
> benchmark showing a problem. For now I think we should take the space
> saving.
>
> Cheers,
> Rafael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171130/8043be2c/attachment.html>


More information about the llvm-commits mailing list