<div dir="ltr">Sorry, that last patch didn't handle endianness very well.  Here's an updated patch that uses llvm::support::endian.  I assume unaligned input, which is safer.  I have no idea whether one can expect aligned input to this function.  It also wouldn't take much to process the first <=3 bytes one at a time, then blast through assuming aligned reads, and then finish up with another <=3 bytes.  Let me know if you prefer that.<br><br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 12, 2017 at 8:49 PM, Scott Smith <span dir="ltr"><<a href="mailto:scott.smith@purestorage.com" target="_blank">scott.smith@purestorage.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Lldb relies heavily on crc when loading shared libraries.  The existing implementation is quite slow as it computes a byte at a time, creating a long dependency chain.<br><br></div>Unfortunately the polynomial is not the same as the one implemented by x86 processors in SSE 4.2, but there's another way to make it faster by using more lookup tables.<br><br></div>Zlib implements this, but rather than require zlib, I instead added the relevant code to compute four bytes at a time in parallel.<br><br></div>A separate patch changes lldb to rely on JamCRC instead of its own implementation.  This patch improves the performance, which brings my test (starting lldb, breaking at main) from 47 seconds down to 36 seconds.<br><br></div>

</blockquote></div><br></div>