[llvm-dev] RFC: Improving performance of HashString
Scott Smith via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 28 21:25:02 PDT 2017
Oh, and another option is to pass the hash value in as a parameter to
insert/find/etc. This would work particularly well for lldb, since it's
ConstString already computes the hash to determine which lock/sub hash
table to use; if it computed a single 64-bit hash, and used the upper bits
to index the lock, and then passed the lower 32-bits to the hashtable
itself, then it'd be another performance boost.
Hmmm while I'm at it, I can parameterize the type of the hash value itself;
32-bits isn't great when you have 2M+ symbols. Ok I'll go prototype this
and see what kind of benefit I can get.
On Fri, Apr 28, 2017 at 9:21 PM, Scott Smith <scott.smith at purestorage.com>
> I wonder if a less invasive change would be to simply make StringMap take
> a hash function as a template parameter, so that different hash functions
> could be plugged in for different uses.
> Clang is more likely to have shorter symbols (all my loop variables are
> called "i" ;-), while lldb has longer symbols (my_outer_namespace::my_inner_
> namespace::my_class<template_param_1, template param_2,
> template_param_3<has_yet_another_template_param, or_two>>::......), so
> each has a different set of tradeoffs.
> On Fri, Apr 28, 2017 at 3:51 PM, Bruce Hoult <bruce at hoult.org> wrote:
>> According to...
>> Bernstein has quality problems (while xx is as good as you get in a
>> non-crypto hash), and xx is 7x (32 bit) - 12x (64 bit) faster.
>> That's on long strings. It would be worth checking the startup overhead
>> for typically short identifiers in programs.
>> See later on in the README:
>> "When used in a hash table the instruction cache will usually beat the
>> CPU and throughput measured here. In my tests the smallest FNV1A beats the
>> fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash
>> functions will lead to more collisions, the overall speed advantage beats
>> the slightly worse quality. See e.g. A Seven-Dimensional Analysis of
>> Hashing Methods and its Implications on Query Processing for a concise
>> overview of the best hash table strategies, confirming that the simpliest
>> Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash
>> functions (Tabulation, Murmur, Farm, ...) when used in a hash table.
>> The fast hash functions tested here are recommendable as fast for file
>> digests and maybe bigger databases, but not for 32bit hash tables."
>> On Sat, Apr 29, 2017 at 12:57 AM, Sean Silva via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>> IIRC when I talked with Chandler (who has a lot of background in
>>> hashing), the Bernstein hash function actually ends up being pretty good
>>> (as a hash function, not necessarily performance) for C/C++ identifier-like
>>> strings (despite not being that good for other types of strings), so we'll
>>> want to verify that we don't regress hash quality (which affects efficiency
>>> of the hash tables). In particular, IIRC this is the function used for
>>> Clang's identifier maps (via StringMap) and so I'd like to see some
>>> measurements that ensure that these performance improvements translate over
>>> to Clang (or at least don't regress).
>>> If Clang doesn't regress and xxHash is measurably better for other
>>> HashString workloads, then I don't see a reason not to switch to it.
>>> -- Sean Silva
>>> On Mon, Apr 24, 2017 at 5:37 PM, Scott Smith via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>> I've been working on improving the startup performance of lldb, and ran
>>>> into an issue with llvm::HashString. It works a character at a time, which
>>>> creates a long dependency chain in the processor. On the other hand, the
>>>> function is very short, which probably works well for short identifiers.
>>>> I don't know how the mix of identifier length seen by lldb compares
>>>> with that seen by llvm/clang; I imagine they're pretty similar.
>>>> I have to different proposals, and wanted to gauge which would be
>>>> 1. Use xxhash instead.
>>>> 2. Use the Intel native crc32q instruction to process 8 bytes at a
>>>> time, then fall back to byte at a time. Non sse 4.2 capable processors
>>>> (either early or non Intel/AMD x86) would use the existing algorithm, or
>>>> possibly #1 above.
>>>> For my test, both result in approximately the same # of cycles (within
>>>> #1 uses 3+% more instructions.
>>>> #2 requires (realistically) runtime detection of cpu capabilities,
>>>> because distributions would want generic x86/x86_64 compatible binaries,
>>>> not separate binaries per cpu feature.
>>>> I'm leaning toward #1 despite the instruction increase. My only worry
>>>> is the effect on compile times for code with lots of short identifiers. I
>>>> haven't tested that though (and I don't have a suitable benchmark suite for
>>>> that) so for all I know I'm worrying about nothing.
>>>> FYI the improvement is approximately 11.5% reduction in cycles for my
>>>> lldb test (b main, run, quit), so IMO it's pretty significant.
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev