[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)

Chandler Carruth chandlerc at google.com
Wed Feb 29 01:35:25 PST 2012


Thanks for the feedback thus far!

I've updated the header file, and enclosed a complete patch against LLVM.
This passes all the regtests, and I'll be doing more thorough testing of
LLVM itself with the patch. I've included some basic unit tests, but could
probably do more here... Not sure it's worth delaying the initial
submission though, as the best testing is to use a hash testing suite...

I've addressed the comments from Nadav, but there may be more minor
stylistic issues or typos. Please keep pointing them out! I appreciate the
help there.

I've also corrected my stub for the execution-seed to be more correct and
to include the ability to override it during program startup. It still
doesn't go the final step to make it actually change on each execution, but
is now otherwise identical. (In particular, it is no longer a compile-time
constant that can be folded.) This regressed the performance a tiny bit,
however...

There are several improvements to the implementation of the hashing
algorithm itself. The way the hashing included the seed hurt performance
quite a bit. I've re-worked it to be much lighter weight, and even after
the execution seed fix above slowed things down, this speeds everything up
more. We shave 4ns off the 4 to 8 byte case, bringing performance above
that of essentially every high quality hashing algorithm...

I still think we can do more, but it's already much faster than the
existing LLVM one except for the issue Tobias pointed out w/ modulo-4 key
sizes. I'm going to investigate this, but it may be a consequence of a
weakness in that algorithm.

I've re-run the SMHasher quality testing suite and the results are very
good. There are a few problems identified, but these are long-known
problems with CityHash in general, and have proven to thus far not be a
cause of real-world issues... I'd like to fix them, but it doesn't seem a
high priority yet.

Finally, I've run some rough initial numbers for x86 32-bit, and it's
surprisingly good. The relative speeds of this algorithm and others don't
seem to change much. A bit suspicious of that, so going to do more thorough
analysis.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120229/31e44981/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Hashing.h
Type: text/x-chdr
Size: 28547 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120229/31e44981/attachment.h>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hashing.diff.gz
Type: application/x-gzip
Size: 13558 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120229/31e44981/attachment.bin>


More information about the llvm-dev mailing list