[llvm-dev] RFC: Improving performance of HashString

Fri Apr 28 21:25:02 PDT 2017

Oh, and another option is to pass the hash value in as a parameter to
insert/find/etc.  This would work particularly well for lldb, since it's
ConstString already computes the hash to determine which lock/sub hash
table to use; if it computed a single 64-bit hash, and used the upper bits
to index the lock, and then passed the lower 32-bits to the hashtable
itself, then it'd be another performance boost.

Hmmm while I'm at it, I can parameterize the type of the hash value itself;
32-bits isn't great when you have 2M+ symbols.  Ok I'll go prototype this
and see what kind of benefit I can get.

On Fri, Apr 28, 2017 at 9:21 PM, Scott Smith <scott.smith at purestorage.com>
wrote:

> I wonder if a less invasive change would be to simply make StringMap take
> a hash function as a template parameter, so that different hash functions
> could be plugged in for different uses.
>
> Clang is more likely to have shorter symbols (all my loop variables are
> called "i" ;-), while lldb has longer symbols (my_outer_namespace::my_inner_
> namespace::my_class<template_param_1, template param_2,
> template_param_3<has_yet_another_template_param, or_two>>::......), so
> each has a different set of tradeoffs.
>
> Thoughts?
>
>
> On Fri, Apr 28, 2017 at 3:51 PM, Bruce Hoult <bruce at hoult.org> wrote:
>
>> According to...
>>
>> https://github.com/rurban/smhasher/blob/master/README.md
>>
>> Bernstein has quality problems (while xx is as good as you get in a
>> non-crypto hash), and xx is 7x (32 bit) - 12x (64 bit) faster.
>>
>> That's on long strings. It would be worth checking the startup overhead
>> for typically short identifiers in programs.
>>
>> See later on in the README:
>>
>> "When used in a hash table the instruction cache will usually beat the
>> CPU and throughput measured here. In my tests the smallest FNV1A beats the
>> fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash
>> functions will lead to more collisions, the overall speed advantage beats
>> the slightly worse quality. See e.g. A Seven-Dimensional Analysis of
>> Hashing Methods and its Implications on Query Processing for a concise
>> overview of the best hash table strategies, confirming that the simpliest
>> Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash
>> functions (Tabulation, Murmur, Farm, ...) when used in a hash table.
>>
>> The fast hash functions tested here are recommendable as fast for file
>> digests and maybe bigger databases, but not for 32bit hash tables."
>>
>>
>>
>> On Sat, Apr 29, 2017 at 12:57 AM, Sean Silva via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> IIRC when I talked with Chandler (who has a lot of background in
>>> hashing), the Bernstein hash function actually ends up being pretty good
>>> (as a hash function, not necessarily performance) for C/C++ identifier-like
>>> strings (despite not being that good for other types of strings), so we'll
>>> want to verify that we don't regress hash quality (which affects efficiency
>>> of the hash tables). In particular, IIRC this is the function used for
>>> Clang's identifier maps (via StringMap) and so I'd like to see some
>>> measurements that ensure that these performance improvements translate over
>>> to Clang (or at least don't regress).
>>>
>>> If Clang doesn't regress and xxHash is measurably better for other
>>> HashString workloads, then I don't see a reason not to switch to it.
>>>
>>> -- Sean Silva
>>>
>>> On Mon, Apr 24, 2017 at 5:37 PM, Scott Smith via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> I've been working on improving the startup performance of lldb, and ran
>>>> into an issue with llvm::HashString.  It works a character at a time, which
>>>> creates a long dependency chain in the processor.  On the other hand, the
>>>> function is very short, which probably works well for short identifiers.
>>>>
>>>> I don't know how the mix of identifier length seen by lldb compares
>>>> with that seen by llvm/clang; I imagine they're pretty similar.
>>>>
>>>> I have to different proposals, and wanted to gauge which would be
>>>> preferred:
>>>>
>>>> 1. Use xxhash instead.
>>>>
>>>> 2. Use the Intel native crc32q instruction to process 8 bytes at a
>>>> time, then fall back to byte at a time.  Non sse 4.2 capable processors
>>>> (either early or non Intel/AMD x86) would use the existing algorithm, or
>>>> possibly #1 above.
>>>>
>>>> For my test, both result in approximately the same # of cycles (within
>>>> 0.5%).
>>>>
>>>> #1 uses 3+% more instructions.
>>>> #2 requires (realistically) runtime detection of cpu capabilities,
>>>> because distributions would want generic x86/x86_64 compatible binaries,
>>>> not separate binaries per cpu feature.
>>>>
>>>> I'm leaning toward #1 despite the instruction increase.  My only worry
>>>> is the effect on compile times for code with lots of short identifiers.  I
>>>> haven't tested that though (and I don't have a suitable benchmark suite for
>>>> that) so for all I know I'm worrying about nothing.
>>>>
>>>> FYI the improvement is approximately 11.5% reduction in cycles for my
>>>> lldb test (b main, run, quit), so IMO it's pretty significant.
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170428/35127780/attachment.html>