[lldb-dev] Improve performance of crc32 calculation
Zachary Turner via lldb-dev
lldb-dev at lists.llvm.org
Wed Apr 12 12:45:31 PDT 2017
BTW, the JamCRC is used in writing Windows COFF object files, PGO
instrumentation, and PDB Debug Info reading / writing, so any work we do to
make it faster will benefit many parts of the toolchain.
On Wed, Apr 12, 2017 at 12:42 PM Zachary Turner <zturner at google.com> wrote:
> It would be nice if we could simply update LLVM's implementation to be
> faster. Having multiple implementations of the same thing seems
> undesirable, especially if one (fast) implementation is always superior to
> some other reason. i.e. there's no reason anyone would ever want to use a
> slow implementation if a fast one is available.
> Can we change the JamCRC implementation in LLVM to use 4-byte slicing and
> parallelize it ourselves? This way there's no dependency on zlib, so even
> people who have non-zlib enabled builds of LLDB get the benefits of the
> fast algorithm.
> On Wed, Apr 12, 2017 at 12:36 PM Scott Smith <scott.smith at purestorage.com>
>> I didn't realize that existed; I just checked and it looks like there's
>> JamCRC which uses the same polynomial. I don't know what "Jam" means in
>> this context, unless it identifies the polynomial some how? The code is
>> also byte-at-a-time.
>> Would you prefer I use JamCRC support code instead, and then change
>> JamCRC to optionally use zlib if it's available?
>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner <zturner at google.com>
>>> Zlib is definitely optional and we cannot make it required.
>>> Did you check to see if llvm has a crc32 function somewhere in Support?
>>> On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
>>> lldb-dev at lists.llvm.org> wrote:
>>>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>>>> computation, which causes long pipeline stalls in modern processors.
>>>> Unfortunately, the polynomial used is not the same one used by the SSE 4.2
>>>> instruction set, but there are two ways to make it faster:
>>>> 1. Work on multiple bytes at a time, using multiple lookup tables. (see
>>>> 2. Compute crcs over separate regions in parallel, then combine the
>>>> results. (see
>>>> As it happens, zlib provides functions for both:
>>>> 1. The zlib crc32 function uses the same polynomial as
>>>> ObjectFileELF.cpp, and uses slicing-by-4 along with loop unrolling.
>>>> 2. The zlib library provides crc32_combine.
>>>> I decided to just call out to the zlib library, since I see my version
>>>> of lldb already links with zlib; however, the llvm CMakeLists.txt declares
>>>> it optional.
>>>> I'm including my patch that assumes zlib is always linked in. Let me
>>>> know if you prefer:
>>>> 1. I make the change conditional on having zlib (i.e. fall back to the
>>>> old code if zlib is not present)
>>>> 2. I copy all the code from zlib and put it in ObjectFileELF.cpp.
>>>> However, I'm going to guess that requires updating some documentation to
>>>> include zlib's copyright notice.
>>>> This brings startup time on my machine / my binary from 50 seconds down
>>>> to 32.
>>>> (time ~/llvm/build/bin/lldb -b -o 'b main' -o 'run' MY_PROGRAM)
>>>> lldb-dev mailing list
>>>> lldb-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lldb-dev