[PATCH] D32971: [scudo] CRC32 optimizations

Kostya Kortchinsky via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 8 14:34:00 PDT 2017


cryptoad added inline comments.


================
Comment at: lib/scudo/scudo_allocator.cpp:61
+#endif  // defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
 }
 
----------------
alekseyshl wrote:
> Now it feels logical to move this entire function along with HashAlgorithm var into scudo_crs32.cpp, but, I guess, not inlining it affects performance, right?
Yes.
And also I would lose the distinction of whether `__SSE4_2__` is defined everywhere (eg: in scudo_allocator.cpp) or only in scudo_crc32.cpp.


================
Comment at: lib/scudo/scudo_crc32.cpp:23
 }
 #endif  // defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
 
----------------
alekseyshl wrote:
> cryptoad wrote:
> > alekseyshl wrote:
> > > cryptoad wrote:
> > > > alekseyshl wrote:
> > > > > Please remind me, does it mean that computeHardwareCRC32 != 0 only when defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)? Or is it expected to come from other libraries too?
> > > > computeHardwareCRC32 != 0 if `defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)`, but it could also be defined by an external library, even though this is **not** the expectation. With full RELRO (`-Wl,-z,relro,-z,now`), if not defined at load, it can't be defined later.
> > > I see. I'm asking because in the current version of the code the same condition switches the top level code to CRC32_INTRINSIC, completely ignoring this function. I was curious if it is even necessary now.
> > > 
> > > It just does not add up. Even for the possible future other uses of computeCRC32, for the performance reasons we have to ifdef to INTRINSIC calls at the call site, which renders computeHardwareCRC32 useless, right?
> > I am going to try and clarify the situation. Let me know if something still doesn't make sense after!
> > 
> > Compilation wise, there are 3 cases:
> >   # `-msse4_2` (or ARM equivalent) is enabled for the whole project (eg: google3), instructions will be emitted accordingly, and we will only use hardware `crc32` (no weak function looked, only intrinsics);
> >   # `-msse4_2` (or ARM equivalent) is enabled for scudo_crc32.cpp only (eg: current cmake config if supported): SSE instructions are only emitted in this file, `computeHardwareCRC32` is defined, and can be used at runtime; no SSE instructions will be emitted anywhere else; this allows runtime detection and to have a HW enabled version that also runs on old hardware;
> >   # `-msse4_2` (or ARM equivalent) is not enabled (eg: current cmake config if not supported): `computeHardwareCRC32` is not defined.
> > 
> > Runtime wise, if the processor supports HW CRC32, then #1 or #2 will leverage it. If it doesn't, #1 will crash on illegal instruction, #2 will still work. #3 works in any case.
> > 
> > This patch will make #1 a lot cleaner ASM wise due to the call inline and the removal of the unnecessary load of `HashType`, #2 & #3 somewhat better due to the check removal for `computeHardwareCRC32` at each iteration.
> > 
> > So I think your point about the top level code completely ignoring it refers to #1, but we still need it for #2.
> Ah, now I remember the original reasoning behind this separation. Mentioning that in comments helps a lot. Thanks! 
Anytime!


https://reviews.llvm.org/D32971





More information about the llvm-commits mailing list