[lldb-dev] Improve performance of crc32 calculation

Kamil Rytarowski via lldb-dev lldb-dev at lists.llvm.org
Thu Apr 13 04:28:32 PDT 2017


There is a good crc32c (assuming we want crc32c) code in DPDK
(BSD-licensed).

http://dpdk.org/browse/dpdk/tree/lib/librte_hash

It has hardware assisted algorithm for x86 and arm64 (if hardware
supports it). There is a fallback to lookup table implementation.

CRC32 is definitely worth merging with LLVM.

On 13.04.2017 13:28, Pavel Labath via lldb-dev wrote:
> Improving the checksumming speed is definitely a worthwhile
> contribution, but be aware that there is a pretty simple way to avoid
> computing the crc altogether, and that is to make sure your binaries
> have a build ID. This is generally as simple as adding -Wl,--build-id to
> your compiler flags.
> 
> +1 to moving the checksumming code to llvm
> 
> pl
> 
> On 13 April 2017 at 07:20, Zachary Turner via lldb-dev
> <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>> wrote:
> 
>     I know this is outside of your initial goal, but it would be really
>     great if JamCRC be updated in llvm to be parallel. I see that you're
>     making use of TaskRunner for the parallelism, but that looks pretty
>     generic, so perhaps that could be raised into llvm as well if it helps.
> 
>     Not trying to throw extra work on you, but it seems like a really
>     good general purpose improvement and it would be a shame if only
>     lldb can benefit from it.
>     On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev
>     <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>> wrote:
> 
>         Ok I stripped out the zlib crc algorithm and just left the
>         parallelism + calls to zlib's crc32_combine, but only if we are
>         actually linking with zlib.  I left those calls here (rather
>         than folding them info JamCRC) because I'm taking advantage of
>         TaskRunner to parallelize the work.
> 
>         I moved the system include block after the llvm includes, both
>         because I had to (to use the config #defines), and because it
>         fit the published coding convention.
> 
>         By itself, it reduces my test time from 55 to 47 seconds. (The
>         original time is slower than before because I pulled the latest
>         code, guess there's another slowdown to fix).
> 
>         On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith
>         <scott.smith at purestorage.com
>         <mailto:scott.smith at purestorage.com>> wrote:
> 
>             The algorithm included in ObjectFileELF.cpp performs a byte
>             at a time computation, which causes long pipeline stalls in
>             modern processors.  Unfortunately, the polynomial used is
>             not the same one used by the SSE 4.2 instruction set, but
>             there are two ways to make it faster:
> 
>             1. Work on multiple bytes at a time, using multiple lookup
>             tables. (see
>             http://create.stephan-brumme.com/crc32/#slicing-by-8-overview <http://create.stephan-brumme.com/crc32/#slicing-by-8-overview>)
>             2. Compute crcs over separate regions in parallel, then
>             combine the results.  (see
>             http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream
>             <http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream>)
> 
>             As it happens, zlib provides functions for both:
>             1. The zlib crc32 function uses the same polynomial as
>             ObjectFileELF.cpp, and uses slicing-by-4 along with loop
>             unrolling.
>             2. The zlib library provides crc32_combine.
> 
>             I decided to just call out to the zlib library, since I see
>             my version of lldb already links with zlib; however, the
>             llvm CMakeLists.txt declares it optional.
> 
>             I'm including my patch that assumes zlib is always linked
>             in.  Let me know if you prefer:
>             1. I make the change conditional on having zlib (i.e. fall
>             back to the old code if zlib is not present)
>             2. I copy all the code from zlib and put it in
>             ObjectFileELF.cpp.  However, I'm going to guess that
>             requires updating some documentation to include zlib's
>             copyright notice.
> 
>             This brings startup time on my machine / my binary from 50
>             seconds down to 32.
>             (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
> 
> 
>         _______________________________________________
>         lldb-dev mailing list
>         lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> 
> 
>     _______________________________________________
>     lldb-dev mailing list
>     lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> 
> 
> 
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170413/ff07ffec/attachment.sig>


More information about the lldb-dev mailing list