[lldb-dev] Improve performance of crc32 calculation

Wed Apr 12 12:45:31 PDT 2017

BTW, the JamCRC is used in writing Windows COFF object files, PGO
instrumentation, and PDB Debug Info reading / writing, so any work we do to
make it faster will benefit many parts of the toolchain.

On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <zturner at google.com> wrote:

> It would be nice if we could simply update LLVM's implementation to be
> faster.  Having multiple implementations of the same thing seems
> undesirable, especially if one (fast) implementation is always superior to
> some other reason.  i.e. there's no reason anyone would ever want to use a
> slow implementation if a fast one is available.
>
> Can we change the JamCRC implementation in LLVM to use 4-byte slicing and
> parallelize it ourselves?  This way there's no dependency on zlib, so even
> people who have non-zlib enabled builds of LLDB get the benefits of the
> fast algorithm.
>
> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <scott.smith at purestorage.com>
> wrote:
>
>> I didn't realize that existed; I just checked and it looks like there's
>> JamCRC which uses the same polynomial.  I don't know what "Jam" means in
>> this context, unless it identifies the polynomial some how?  The code is
>> also byte-at-a-time.
>>
>> Would you prefer I use JamCRC support code instead, and then change
>> JamCRC to optionally use zlib if it's available?
>>
>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> Zlib is definitely optional and we cannot make it required.
>>>
>>> Did you check to see if llvm has a crc32 function somewhere in Support?
>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
>>> lldb-dev at lists.llvm.org> wrote:
>>>
>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>> computation, which causes long pipeline stalls in modern processors.
>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>> instruction set, but there are two ways to make it faster:
>>>>
>>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>>>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>> results.  (see
>>>> http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream
>>>> )
>>>>
>>>> As it happens, zlib provides functions for both:
>>>> 1. The zlib crc32 function uses the same polynomial as
>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>> 2. The zlib library provides crc32_combine.
>>>>
>>>> I decided to just call out to the zlib library, since I see my version
>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>> it optional.
>>>>
>>>> I'm including my patch that assumes zlib is always linked in.  Let me
>>>> know if you prefer:
>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>> old code if zlib is not present)
>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>> However, I'm going to guess that requires updating some documentation to
>>>> include zlib's copyright notice.
>>>>
>>>> This brings startup time on my machine / my binary from 50 seconds down
>>>> to 32.
>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170412/e7f99129/attachment-0001.html>