[lldb-dev] Improve performance of crc32 calculation

Scott Smith via lldb-dev lldb-dev at lists.llvm.org
Thu Apr 13 10:31:39 PDT 2017


Interesting.  That saves lldb startup time (after crc
improvements/parallelization) by about 1.25 seconds wall clock / 10 seconds
cpu time, but increases linking by about 2 seconds of cpu time (and an
inconsistent amount of wall clock time).  That's only a good tradeoff if
you run the debugger a lot.

If all you need is a unique id, there are cheaper ways of going about it.
The SSE crc instruction would be cheaper, or using CityHash/MurmurHash for
other cpus.  I thought it was specifically tied to that crc algorithm.  In
that case it doesn't make sense to fold this into JamCRC, since that's tied
to a difficult-to-optimize algorithm.

On Thu, Apr 13, 2017 at 4:28 AM, Pavel Labath <labath at google.com> wrote:

> Improving the checksumming speed is definitely a worthwhile contribution,
> but be aware that there is a pretty simple way to avoid computing the crc
> altogether, and that is to make sure your binaries have a build ID. This is
> generally as simple as adding -Wl,--build-id to your compiler flags.
>
> +1 to moving the checksumming code to llvm
>
> pl
>
> On 13 April 2017 at 07:20, Zachary Turner via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
>
>> I know this is outside of your initial goal, but it would be really great
>> if JamCRC be updated in llvm to be parallel. I see that you're making use
>> of TaskRunner for the parallelism, but that looks pretty generic, so
>> perhaps that could be raised into llvm as well if it helps.
>>
>> Not trying to throw extra work on you, but it seems like a really good
>> general purpose improvement and it would be a shame if only lldb can
>> benefit from it.
>> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
>> lldb-dev at lists.llvm.org> wrote:
>>
>>> Ok I stripped out the zlib crc algorithm and just left the parallelism +
>>> calls to zlib's crc32_combine, but only if we are actually linking with
>>> zlib.  I left those calls here (rather than folding them info JamCRC)
>>> because I'm taking advantage of TaskRunner to parallelize the work.
>>>
>>> I moved the system include block after the llvm includes, both because I
>>> had to (to use the config #defines), and because it fit the published
>>> coding convention.
>>>
>>> By itself, it reduces my test time from 55 to 47 seconds. (The original
>>> time is slower than before because I pulled the latest code, guess there's
>>> another slowdown to fix).
>>>
>>> On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith <
>>> scott.smith at purestorage.com> wrote:
>>>
>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>> computation, which causes long pipeline stalls in modern processors.
>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>> instruction set, but there are two ways to make it faster:
>>>>
>>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>>>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>> results.  (see http://stackoverflow.com/quest
>>>> ions/23122312/crc-calculation-of-a-mostly-static-data-stream)
>>>>
>>>> As it happens, zlib provides functions for both:
>>>> 1. The zlib crc32 function uses the same polynomial as
>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>> 2. The zlib library provides crc32_combine.
>>>>
>>>> I decided to just call out to the zlib library, since I see my version
>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>> it optional.
>>>>
>>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>>> know if you prefer:
>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>> old code if zlib is not present)
>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>> However, I'm going to guess that requires updating some documentation to
>>>> include zlib's copyright notice.
>>>>
>>>> This brings startup time on my machine / my binary from 50 seconds down
>>>> to 32.
>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>
>>>>
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170413/8b06a9bc/attachment-0001.html>


More information about the lldb-dev mailing list