[PATCH] D41993: [ELF] - Change shift2 constant of GNU_HASH from 6->11.

Wed Jan 17 07:07:22 PST 2018

>That said, I don't think this patch makes much sense. There's no theory behind it to explain why this could make things faster (it shouldn't as long as the hash >function generates evenly distributed hash value). I think what we should do is

I'll explain my theory at the end.

>
>1. first, verify that our bloom filter is really correct by taking a look at the output binary carefully, because even if the bloom filter is wrong, that doesn't cause any >correctness issue, but instead it just makes things slower, and then

That is not entirely correct. If you put few excessive bits to bloom filter, nothing too bad will happen,
you'll increase the amount of false positives, in that case you're right, but if you forget to set proper bit, things will most likely just stop working.
It can be checked simple - I changed the position of one of bits in our filter to break it,
and instantly faced with "symbol lookup error" from loader.

So I think our filter is correct. Not only basing on above, but also I debugged gold, I tweaked LLD settings to be equal with gold's for sample I used
and compared what we do. (sample test.s I used and LLD patch are in attachment, it is based on one of our tests + had to add few symbols that
gold adds to .dynsym, for gold had to add --noinhibit-exec to produce the output).

Even with the fact that gold uses slightly different order of filling bloom filter (we use list of symbols that is already sorted by bucket ids, gold does not,
what breaks the order of filling, though symbols in the final output are still sorted fine), at the end I had 2 equal bloom filters.

What was different in output is a most of hash table part of .gnu_hash. I had no chance to investigate it deeply yet. I guess it is relative to symbols order too,
but probably worth to check to be sure we are not broken here. Have plans to do that.

>2. understand what's the difference between our output and GNU linker's output.
>
>I believe just changing parameters and benchmaking them is not a good strategy. I want to understand it first before getting some optimized parameters based on >experiments.

I think worth to mention following thing. When we fill bloom filter words, we calculate index of word as
size_t I = (Sym.Hash / C) & (MaskWords - 1);
(https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L1746)

It looks correct (and FWIW gold do the same logic), but noticable thing here is that even when we have
symbols that all has different Hash, left part which is "Sym.Hash / C" breaks things a bit, so that in a list below:

Name: __bss_start, Hash: 475558360, Hash/C: 7430599
Name: _end, Hash: 209000339, Hash/C: 32656270
Name: sym1, Hash: 209074775, Hash/C: 32667840
Name: sym2, Hash: 209074776, Hash/C: 32667840
Name: _edata, Hash: 397399875, Hash/C: 62084373
Name: sym3, Hash: 209074777, Hash/C: 32667840
Name: sym4, Hash: 209074778, Hash/C: 32667840
Name: sym5, Hash: 209074779, Hash/C: 32667840
Name: sym10, Hash: 27500887, Hash/C: 4296904
Name: sym6, Hash: 209074780, Hash/C: 32667840
Name: sym11, Hash: 27500888, Hash/C: 4296904
Name: sym7, Hash: 209074781, Hash/C: 32667840
Name: sym12, Hash: 27500889, Hash/C: 4296904
Name: sym8, Hash: 209074782, Hash/C: 32667840
Name: sym13, Hash: 27500890, Hash/C: 4296904
Name: sym9, Hash: 209074783, Hash/C: 32667840
Name: sym14, Hash: 27500891, Hash/C: 4296904
Name: sym15, Hash: 27500892, Hash/C: 4296904
Name: sym16, Hash: 27500893, Hash/C: 4296904

only __bss_start and _edata has I == 1, all other symbols has I == 0. I think that shows that fill of bloom filter can be
very depenent on symbol names.

My theory was based on tweaking Shift2 so that it should set the new second bit as often as possible.
It was invented basing on following idea. Imaging we have some symbols "A", "B", "C".

1) Lets assume we placed A to bloom filter and it became 11000000.
2) Lets try to lookup B. We should fail. If B has first bit equal to 10000000, then ideal situation for us is that B should have
second bit different from 01000000, because we do not want to have false positive during lookup.
3) So we want to place B, basing on (2), we want to setup Shift2 so that it will place second bit on a free space,
like result can be 11100000.
4) Repeat 1-3 with C. We should have something like 11110000 now.

It shows it can be reasonable to fill as much bits as possible. It is probably not too strong heuristics,
but it shows some good results.

George.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180117/903ac2d1/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.s
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180117/903ac2d1/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.patch
Type: application/octet-stream
Size: 129379 bytes
Desc: patch.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180117/903ac2d1/attachment-0001.obj>