[PATCH] D32971: [scudo] CRC32 optimizations

Mon May 8 12:54:32 PDT 2017

dvyukov added inline comments.

================
Comment at: lib/scudo/scudo_allocator.cpp:39
-
-INLINE u32 computeCRC32(u32 Crc, uptr Data, u8 HashType) {
-  // If SSE4.2 is defined here, it was enabled everywhere, as opposed to only
----------------
cryptoad wrote:
> cryptoad wrote:
> > dvyukov wrote:
> > > cryptoad wrote:
> > > > dvyukov wrote:
> > > > > It looked nice to have a separate function that wraps all of this logic. Why do you inline it?
> > > > > I would just move the load of HashAlgorithm into this function so that it does not happen when not needed.
> > > > Indeed, I considered this as well.
> > > > The other advantage of doing it the new way was to have the `crc32` instructions inlined as opposed to be calls (which is another part of the performance gain), which required me to use the intrinsic within the loop. After duplicating the loop with intrinsic, I didn't see a need to keep that function.
> > > Do you can have intrinsics in the inlinable computeCRC32 function. It should lead to the same assembly.
> > > With intrinsics it becomes more complex, so there is even more reason to have a separate function for it. Consider that we want to calculate crc in a another place.
> > So something like:
> > ```
> > INLINE u32 computeCRC32(u32 Crc, uptr Data, u8 HashType) {
> > #if defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
> >   return CRC32_INTRINSIC(Crc, Data);
> > #else
> >   if (atomic_load_relaxed(&HashAlgorithm) == CRC32Hardware)
> >     return computeHardwareCRC32(Crc, Data);
> >   return computeSoftwareCRC32(Crc, Data);
> > #endif
> > }
> > ```
> (without the 3rd param)
yup

https://reviews.llvm.org/D32971