[lldb-dev] Improve performance of crc32 calculation

Thu Apr 13 04:28:41 PDT 2017

Improving the checksumming speed is definitely a worthwhile contribution,
but be aware that there is a pretty simple way to avoid computing the crc
altogether, and that is to make sure your binaries have a build ID. This is
generally as simple as adding -Wl,--build-id to your compiler flags.

+1 to moving the checksumming code to llvm

pl

On 13 April 2017 at 07:20, Zachary Turner via lldb-dev <
lldb-dev at lists.llvm.org> wrote:

> I know this is outside of your initial goal, but it would be really great
> if JamCRC be updated in llvm to be parallel. I see that you're making use
> of TaskRunner for the parallelism, but that looks pretty generic, so
> perhaps that could be raised into llvm as well if it helps.
>
> Not trying to throw extra work on you, but it seems like a really good
> general purpose improvement and it would be a shame if only lldb can
> benefit from it.
> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
>
>> Ok I stripped out the zlib crc algorithm and just left the parallelism +
>> calls to zlib's crc32_combine, but only if we are actually linking with
>> zlib.  I left those calls here (rather than folding them info JamCRC)
>> because I'm taking advantage of TaskRunner to parallelize the work.
>>
>> I moved the system include block after the llvm includes, both because I
>> had to (to use the config #defines), and because it fit the published
>> coding convention.
>>
>> By itself, it reduces my test time from 55 to 47 seconds. (The original
>> time is slower than before because I pulled the latest code, guess there's
>> another slowdown to fix).
>>
>> On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith <
>> scott.smith at purestorage.com> wrote:
>>
>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>> computation, which causes long pipeline stalls in modern processors.
>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>> instruction set, but there are two ways to make it faster:
>>>
>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>> 2. Compute crcs over separate regions in parallel, then combine the
>>> results.  (see http://stackoverflow.com/questions/23122312/crc-
>>> calculation-of-a-mostly-static-data-stream)
>>>
>>> As it happens, zlib provides functions for both:
>>> 1. The zlib crc32 function uses the same polynomial as
>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>> 2. The zlib library provides crc32_combine.
>>>
>>> I decided to just call out to the zlib library, since I see my version
>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>> it optional.
>>>
>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>> know if you prefer:
>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>> old code if zlib is not present)
>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>> However, I'm going to guess that requires updating some documentation to
>>> include zlib's copyright notice.
>>>
>>> This brings startup time on my machine / my binary from 50 seconds down
>>> to 32.
>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>
>>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170413/67537ca3/attachment-0001.html>