Attempts at speeding up StringTableBuilder

Mon Oct 17 05:23:17 PDT 2016

On 17 October 2016 at 00:20, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
> On Fri, Oct 14, 2016 at 11:23 AM, Rafael Espíndola via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>>
>> I have put some effort to try to speed up StringTableBuilder. The last
>> thing that worked was committed as r284249.
>>
>> The main difficulty in optimizing it is the large number of strings it
>> has to handle. In the case of xul, one of the string tables has
>> 14_375_801 strings added, out of which only 1_726_762 are unique.
>
>
> Where is time being spent?
>
> It sounds like in this case, most of the hash lookups are hits?

Correct. The experiments with splitting parts of the hash table to a
side std::vector slowed down and it seems it did so because the common
task is just to check that the hash lookup did find the correct
object.

> It's possible to make it so there are no hash calculations that fail, but if
> we aren't spending time in either:
>
> hash misses
> or
> hash calculation
>
> this is going to be hard to beat.

Yes, that is my impression. With CachedHash the hash computation cost
is small and there are few collisions. One way to look at it is that
the hash table became almost a counting sort.

Cheers,
Rafael