[PATCH] D41993: [ELF] - Change shift2 constant of GNU_HASH from 6->11.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 16 18:11:19 PST 2018


I believe I found a bug in writeBloomFilter, and I believe that bug is the
cause of the bad performance numbers lld-generated DSOs. I'll send a patch
soon.

On Tue, Jan 16, 2018 at 5:43 PM, Rui Ueyama <ruiu at google.com> wrote:

> I feel like this change looks a bit too arbitrary. If we can prove that it
> is in general an improvement, I'm fine with this change, but even George's
> experiment shows that 11 may not be an optimal number.
>
> Probably what we should do is to try different shift2 numbers to see if
> which one yields the best result on the fly. I'll try to see if this isn't
> too slow.
>
> On Tue, Jan 16, 2018 at 11:13 AM, Rafael EspĂ­ndola <
> rafael.espindola at gmail.com> wrote:
>
>> I have mixed feelings about this.
>>
>> I can reproduce an improvement with 11:
>>
>> shift=6
>> 160.683939775 seconds time elapsed
>>      ( +-  0.18% )
>> 161.236797654 seconds time elapsed
>>      ( +-  0.30% )
>>
>> shift=11
>> 157.118922858 seconds time elapsed
>>      ( +-  0.10% )
>> 158.837348716 seconds time elapsed
>>      ( +-  0.18% )
>>
>> But there is still a somewhat large variation between perf runs.
>>
>> This also feels a bit magical, but maybe it makes perfect sense for
>> someone familiar with hash tables and bloom filters.
>>
>> So I guess I am OK with the patch if it includes a comment saying that
>> the value was found experimentally.
>>
>> Rui, what do you think?
>>
>> Cheers,
>> Rafael
>>
>>
>> On 15 January 2018 at 06:52, George Rimar <grimar at accesssoftek.com>
>> wrote:
>> >>Do you know why this produces a better bloom filter?
>> >>
>> >
>> > I think so. My thoughts are below.
>> >
>> > Bloom filter bits are calculated as:
>> > H1 = dl_new_hash(name);
>> > H2 = H1 >> shift2;
>> > BITMASK = (1 << (H1 % C)) | (1 << (H2 % C));
>> > bloom[N] |= BITMASK;
>> > (sample taken from https://blogs.oracle.com/ali/gnu-hash-elf-sections).
>> >
>> > As far I understand we ideally should archieve next thing when writing
>> such filter:
>> > (using out code now)
>> >
>> > We apply bit 1 at first:
>> > Val |= uint64_t(1) << (Sym.Hash % C);
>> >
>> > Then bit 2:
>> > Val |= uint64_t(1) << ((Sym.Hash >> getShift2()) % C);
>> >
>> > I believe idea here is that we would like to find such shift2 constant
>> that applying
>> > bit 2 to Val should change Val as often as possible,
>> > (Val | Bit1) ideally should be different from (Val | Bit1 | Bit2).
>> > So we want to use as much different bits as possible in bloom filter
>> overall.
>> >
>> > That was why I tried to play with Shift2 initially.
>> >
>> > Today I wrote simple test. It generates N symbols with random name of
>> random length.
>> > Then calculates Score for each Shift2 possible, where Score is amount
>> of times where setting of Bit2
>> > changed the bloom filter entry value. (patch is attached).
>> > So idea was to find Shift2 so that Score is maximum.
>> >
>> > Results looks a bit strange for me:
>> > [Shift2] | [Score]
>> > [0]  -> [0]
>> > [1]  -> [8338]
>> > [2]  -> [7762]
>> > [3]  -> [6736]
>> > [4]  -> [5281]
>> > [5]  -> [3541]
>> > [6]  -> [1995]
>> > [7]  -> [1993]
>> > [8]  -> [1992]
>> > [9]  -> [1991]
>> > [10] -> [1995]
>> > [11] -> [1985]
>> > [12] -> [3501]
>> > [13] -> [5135]
>> > [14] -> [6402]
>> > [15] -> [7158]
>> > [16] -> [7640]
>> > [17] -> [7866]
>> > [18] -> [7828]
>> > [19] -> [7820]
>> > [20] -> [7823]
>> > [21] -> [7689]
>> > [22] -> [7712]
>> > [23] -> [7715]
>> > [24] -> [7608]
>> > [25] -> [7591]
>> > [26] -> [7556]
>> > [27] -> [7156]
>> > [28] -> [6788]
>> > [29] -> [6010]
>> > [30] -> [5014]
>> >
>> > So according to them, there is almost no difference between Shift2==6
>> and Shift2==11,
>> > though Shift2==12 already shows significant difference. Best results
>> are usually in Shift2 = [15..20] and at [1].
>> >
>> > My patch changed value to 11. According to test above that should give
>> no effect, because 11 != 12,
>> > but I think that random symbol names are just to far from real live
>> names used in LLVM,
>> > and so probably nothing wrong with that 11 shows good results for
>> check-llvm calls I tried.
>> >
>> > Earlier (I mentioned at bug page) I also observer good result with
>> Shift2=14. We probably could use it
>> > or some other good value in [11..20] instead. I think any of them work
>> much better than current value 6 used.
>> >
>> > George.
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180116/72b836d6/attachment.html>


More information about the llvm-commits mailing list