[lldb-dev] Improve performance of crc32 calculation
Zachary Turner via lldb-dev
lldb-dev at lists.llvm.org
Wed Apr 12 23:20:11 PDT 2017
I know this is outside of your initial goal, but it would be really great
if JamCRC be updated in llvm to be parallel. I see that you're making use
of TaskRunner for the parallelism, but that looks pretty generic, so
perhaps that could be raised into llvm as well if it helps.
Not trying to throw extra work on you, but it seems like a really good
general purpose improvement and it would be a shame if only lldb can
benefit from it.
On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
lldb-dev at lists.llvm.org> wrote:
> Ok I stripped out the zlib crc algorithm and just left the parallelism +
> calls to zlib's crc32_combine, but only if we are actually linking with
> zlib. I left those calls here (rather than folding them info JamCRC)
> because I'm taking advantage of TaskRunner to parallelize the work.
>
> I moved the system include block after the llvm includes, both because I
> had to (to use the config #defines), and because it fit the published
> coding convention.
>
> By itself, it reduces my test time from 55 to 47 seconds. (The original
> time is slower than before because I pulled the latest code, guess there's
> another slowdown to fix).
>
> On Wed, Apr 12, 2017 at 12:15 PM, Scott Smith <scott.smith at purestorage.com
> > wrote:
>
>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>> computation, which causes long pipeline stalls in modern processors.
>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>> instruction set, but there are two ways to make it faster:
>>
>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>> http://create.stephan-brumme.com/crc32/#slicing-by-8-overview)
>> 2. Compute crcs over separate regions in parallel, then combine the
>> results. (see
>> http://stackoverflow.com/questions/23122312/crc-calculation-of-a-mostly-static-data-stream
>> )
>>
>> As it happens, zlib provides functions for both:
>> 1. The zlib crc32 function uses the same polynomial as ObjectFileELF.cpp,
>> and uses slicing-by-4 along with loop unrolling.
>> 2. The zlib library provides crc32_combine.
>>
>> I decided to just call out to the zlib library, since I see my version of
>> lldb already links with zlib; however, the llvm CMakeLists.txt declares it
>> optional.
>>
>> I'm including my patch that assumes zlib is always linked in. Let me
>> know if you prefer:
>> 1. I make the change conditional on having zlib (i.e. fall back to the
>> old code if zlib is not present)
>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>> However, I'm going to guess that requires updating some documentation to
>> include zlib's copyright notice.
>>
>> This brings startup time on my machine / my binary from 50 seconds down
>> to 32.
>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>
>>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170413/bcdca492/attachment.html>
More information about the lldb-dev
mailing list