[PATCH] D32971: [scudo] CRC32 optimizations

Kostya Kortchinsky via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 8 13:11:19 PDT 2017


cryptoad added inline comments.


================
Comment at: lib/scudo/scudo_allocator.cpp:39
-
-INLINE u32 computeCRC32(u32 Crc, uptr Data, u8 HashType) {
-  // If SSE4.2 is defined here, it was enabled everywhere, as opposed to only
----------------
dvyukov wrote:
> cryptoad wrote:
> > cryptoad wrote:
> > > dvyukov wrote:
> > > > cryptoad wrote:
> > > > > dvyukov wrote:
> > > > > > It looked nice to have a separate function that wraps all of this logic. Why do you inline it?
> > > > > > I would just move the load of HashAlgorithm into this function so that it does not happen when not needed.
> > > > > Indeed, I considered this as well.
> > > > > The other advantage of doing it the new way was to have the `crc32` instructions inlined as opposed to be calls (which is another part of the performance gain), which required me to use the intrinsic within the loop. After duplicating the loop with intrinsic, I didn't see a need to keep that function.
> > > > Do you can have intrinsics in the inlinable computeCRC32 function. It should lead to the same assembly.
> > > > With intrinsics it becomes more complex, so there is even more reason to have a separate function for it. Consider that we want to calculate crc in a another place.
> > > So something like:
> > > ```
> > > INLINE u32 computeCRC32(u32 Crc, uptr Data, u8 HashType) {
> > > #if defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
> > >   return CRC32_INTRINSIC(Crc, Data);
> > > #else
> > >   if (atomic_load_relaxed(&HashAlgorithm) == CRC32Hardware)
> > >     return computeHardwareCRC32(Crc, Data);
> > >   return computeSoftwareCRC32(Crc, Data);
> > > #endif
> > > }
> > > ```
> > (without the 3rd param)
> yup
Just gave it a try, for the non-full-SSE the loop is unrolled but with a load & check of the algorithm at each iteration (2 for 64-bit, 3 for 32-bit):

```
                mov     rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie
                mov     dl, cs:_ZN7__scudoL13HashAlgorithmE ; unsigned __int64
                cmp     dl, 1
                jnz     short loc_884B
                mov     edi, eax        ; this
                mov     rsi, rcx        ; unsigned int
                call    _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
                jmp     loc_88DB
loc_884B:                               ; CODE XREF: __scudo::ScudoChunk::computeChecksum(__scudo::UnpackedHeader *)+1Aj

... software crc32 ...

loc_88DB:                               ; CODE XREF: __scudo::ScudoChunk::computeChecksum(__scudo::UnpackedHeader *)+26j
                mov     cl, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm
                cmp     cl, 1
                jnz     short loc_88FC
                and     r14, 0FFFFFFFFFFFF0000h
                mov     edi, eax        ; this
                mov     rsi, r14        ; unsigned int
                call    _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
                jmp     loc_8982
loc_88FC:                               ; CODE XREF: __scudo::ScudoChunk::computeChecksum(__scudo::UnpackedHeader *)+C4j

... software crc32 ...
```

It seems somewhat messier. What do you think?


https://reviews.llvm.org/D32971





More information about the llvm-commits mailing list