[PATCH] D41993: [ELF] - Change shift2 constant of GNU_HASH from 6->11.

Rafael EspĂ­ndola via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 16 11:13:32 PST 2018


I have mixed feelings about this.

I can reproduce an improvement with 11:

shift=6
160.683939775 seconds time elapsed
     ( +-  0.18% )
161.236797654 seconds time elapsed
     ( +-  0.30% )

shift=11
157.118922858 seconds time elapsed
     ( +-  0.10% )
158.837348716 seconds time elapsed
     ( +-  0.18% )

But there is still a somewhat large variation between perf runs.

This also feels a bit magical, but maybe it makes perfect sense for
someone familiar with hash tables and bloom filters.

So I guess I am OK with the patch if it includes a comment saying that
the value was found experimentally.

Rui, what do you think?

Cheers,
Rafael


On 15 January 2018 at 06:52, George Rimar <grimar at accesssoftek.com> wrote:
>>Do you know why this produces a better bloom filter?
>>
>
> I think so. My thoughts are below.
>
> Bloom filter bits are calculated as:
> H1 = dl_new_hash(name);
> H2 = H1 >> shift2;
> BITMASK = (1 << (H1 % C)) | (1 << (H2 % C));
> bloom[N] |= BITMASK;
> (sample taken from https://blogs.oracle.com/ali/gnu-hash-elf-sections).
>
> As far I understand we ideally should archieve next thing when writing such filter:
> (using out code now)
>
> We apply bit 1 at first:
> Val |= uint64_t(1) << (Sym.Hash % C);
>
> Then bit 2:
> Val |= uint64_t(1) << ((Sym.Hash >> getShift2()) % C);
>
> I believe idea here is that we would like to find such shift2 constant that applying
> bit 2 to Val should change Val as often as possible,
> (Val | Bit1) ideally should be different from (Val | Bit1 | Bit2).
> So we want to use as much different bits as possible in bloom filter overall.
>
> That was why I tried to play with Shift2 initially.
>
> Today I wrote simple test. It generates N symbols with random name of random length.
> Then calculates Score for each Shift2 possible, where Score is amount of times where setting of Bit2
> changed the bloom filter entry value. (patch is attached).
> So idea was to find Shift2 so that Score is maximum.
>
> Results looks a bit strange for me:
> [Shift2] | [Score]
> [0]  -> [0]
> [1]  -> [8338]
> [2]  -> [7762]
> [3]  -> [6736]
> [4]  -> [5281]
> [5]  -> [3541]
> [6]  -> [1995]
> [7]  -> [1993]
> [8]  -> [1992]
> [9]  -> [1991]
> [10] -> [1995]
> [11] -> [1985]
> [12] -> [3501]
> [13] -> [5135]
> [14] -> [6402]
> [15] -> [7158]
> [16] -> [7640]
> [17] -> [7866]
> [18] -> [7828]
> [19] -> [7820]
> [20] -> [7823]
> [21] -> [7689]
> [22] -> [7712]
> [23] -> [7715]
> [24] -> [7608]
> [25] -> [7591]
> [26] -> [7556]
> [27] -> [7156]
> [28] -> [6788]
> [29] -> [6010]
> [30] -> [5014]
>
> So according to them, there is almost no difference between Shift2==6 and Shift2==11,
> though Shift2==12 already shows significant difference. Best results are usually in Shift2 = [15..20] and at [1].
>
> My patch changed value to 11. According to test above that should give no effect, because 11 != 12,
> but I think that random symbol names are just to far from real live names used in LLVM,
> and so probably nothing wrong with that 11 shows good results for check-llvm calls I tried.
>
> Earlier (I mentioned at bug page) I also observer good result with Shift2=14. We probably could use it
> or some other good value in [11..20] instead. I think any of them work much better than current value 6 used.
>
> George.
>


More information about the llvm-commits mailing list