[PATCH] D41993: [ELF] - Change shift2 constant of GNU_HASH from 6->11.

George Rimar via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 15 06:52:26 PST 2018


>Do you know why this produces a better bloom filter?
>

I think so. My thoughts are below.

Bloom filter bits are calculated as:
H1 = dl_new_hash(name);
H2 = H1 >> shift2;
BITMASK = (1 << (H1 % C)) | (1 << (H2 % C));
bloom[N] |= BITMASK;
(sample taken from https://blogs.oracle.com/ali/gnu-hash-elf-sections).

As far I understand we ideally should archieve next thing when writing such filter:
(using out code now)

We apply bit 1 at first:
Val |= uint64_t(1) << (Sym.Hash % C);

Then bit 2:
Val |= uint64_t(1) << ((Sym.Hash >> getShift2()) % C);

I believe idea here is that we would like to find such shift2 constant that applying
bit 2 to Val should change Val as often as possible,
(Val | Bit1) ideally should be different from (Val | Bit1 | Bit2).
So we want to use as much different bits as possible in bloom filter overall.

That was why I tried to play with Shift2 initially.

Today I wrote simple test. It generates N symbols with random name of random length. 
Then calculates Score for each Shift2 possible, where Score is amount of times where setting of Bit2
changed the bloom filter entry value. (patch is attached). 
So idea was to find Shift2 so that Score is maximum.

Results looks a bit strange for me:
[Shift2] | [Score]
[0]  -> [0]
[1]  -> [8338]
[2]  -> [7762]
[3]  -> [6736]
[4]  -> [5281]
[5]  -> [3541]
[6]  -> [1995]
[7]  -> [1993]
[8]  -> [1992]
[9]  -> [1991]
[10] -> [1995]
[11] -> [1985]
[12] -> [3501]
[13] -> [5135]
[14] -> [6402]
[15] -> [7158]
[16] -> [7640]
[17] -> [7866]
[18] -> [7828]
[19] -> [7820]
[20] -> [7823]
[21] -> [7689]
[22] -> [7712]
[23] -> [7715]
[24] -> [7608]
[25] -> [7591]
[26] -> [7556]
[27] -> [7156]
[28] -> [6788]
[29] -> [6010]
[30] -> [5014]

So according to them, there is almost no difference between Shift2==6 and Shift2==11, 
though Shift2==12 already shows significant difference. Best results are usually in Shift2 = [15..20] and at [1].

My patch changed value to 11. According to test above that should give no effect, because 11 != 12,
but I think that random symbol names are just to far from real live names used in LLVM,
and so probably nothing wrong with that 11 shows good results for check-llvm calls I tried.

Earlier (I mentioned at bug page) I also observer good result with Shift2=14. We probably could use it
or some other good value in [11..20] instead. I think any of them work much better than current value 6 used.

George.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.patch
Type: application/octet-stream
Size: 101258 bytes
Desc: patch.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180115/d409257d/attachment.obj>


More information about the llvm-commits mailing list