[lldb-dev] Improve performance of crc32 calculation
Zachary Turner via lldb-dev
lldb-dev at lists.llvm.org
Wed Apr 12 13:00:04 PDT 2017
It seems like the the crc32_combine is not too hard to implement, but we do
already have code in LLVM that is predicated on the existence of zlib, so
it seems reasonable to leave it that way. And yes, you would submit those
changes to llvm-dev. Perhaps the JamCRC implementation could be updated to
have the conditional logic already embedded, so that the only
implementation is the JamCRC implementation, and people don't have to think
about how to get the "fast" one.
On Wed, Apr 12, 2017 at 12:52 PM Scott Smith <scott.smith at purestorage.com>
wrote:
> What about the crc combining? I don't feel comfortable reimplementing
> that on my own. Can I leave that as a feature predicated on zlib?
>
> For the JamCRC improvements, I assume I submit that to llvm-dev@ instead?
>
>
> On Wed, Apr 12, 2017 at 12:45 PM, Zachary Turner <zturner at google.com>
> wrote:
>
>> BTW, the JamCRC is used in writing Windows COFF object files, PGO
>> instrumentation, and PDB Debug Info reading / writing, so any work we do to
>> make it faster will benefit many parts of the toolchain.
>>
>> On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> It would be nice if we could simply update LLVM's implementation to be
>>> faster. Having multiple implementations of the same thing seems
>>> undesirable, especially if one (fast) implementation is always superior to
>>> some other reason. i.e. there's no reason anyone would ever want to use a
>>> slow implementation if a fast one is available.
>>>
>>> Can we change the JamCRC implementation in LLVM to use 4-byte slicing
>>> and parallelize it ourselves? This way there's no dependency on zlib, so
>>> even people who have non-zlib enabled builds of LLDB get the benefits of
>>> the fast algorithm.
>>>
>>> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <
>>> scott.smith at purestorage.com> wrote:
>>>
>>>> I didn't realize that existed; I just checked and it looks like there's
>>>> JamCRC which uses the same polynomial. I don't know what "Jam" means in
>>>> this context, unless it identifies the polynomial some how? The code is
>>>> also byte-at-a-time.
>>>>
>>>> Would you prefer I use JamCRC support code instead, and then change
>>>> JamCRC to optionally use zlib if it's available?
>>>>
>>>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>> Zlib is definitely optional and we cannot make it required.
>>>>>
>>>>> Did you check to see if llvm has a crc32 function somewhere in Support?
>>>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>>>> computation, which causes long pipeline stalls in modern processors.
>>>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>>>> instruction set, but there are two ways to make it faster:
>>>>>>
>>>>>> 1. Work on multiple bytes at a time, using multiple lookup tables.
>>>>>> (see http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>>>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>>>> results. (see
>>>>>> http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream
>>>>>> )
>>>>>>
>>>>>> As it happens, zlib provides functions for both:
>>>>>> 1. The zlib crc32 function uses the same polynomial as
>>>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>>>> 2. The zlib library provides crc32_combine.
>>>>>>
>>>>>> I decided to just call out to the zlib library, since I see my
>>>>>> version of lldb already links with zlib; however, the llvm CMakeLists.txt
>>>>>> declares it optional.
>>>>>>
>>>>>> I'm including my patch that assumes zlib is always linked in. Let me
>>>>>> know if you prefer:
>>>>>> 1. I make the change conditional on having zlib (i.e. fall back to
>>>>>> the old code if zlib is not present)
>>>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>>>> However, I'm going to guess that requires updating some documentation to
>>>>>> include zlib's copyright notice.
>>>>>>
>>>>>> This brings startup time on my machine / my binary from 50 seconds
>>>>>> down to 32.
>>>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>>>>
>>>>>> _______________________________________________
>>>>>> lldb-dev mailing list
>>>>>> lldb-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170412/0a72c28c/attachment.html>
More information about the lldb-dev
mailing list