[PATCH] D42740: Implement a case-folding version of DJB hash

Pavel Labath via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 14 03:51:45 PST 2018


labath reopened this revision.
labath added inline comments.
This revision is now accepted and ready to land.


================
Comment at: llvm/trunk/lib/Support/DJB.cpp:27
 uint32_t llvm::djbHash(StringRef Buffer, uint32_t H) {
   for (char C : Buffer.bytes())
+    H = djbHashChar(C, H);
----------------
The know values tests have already payed off in that they detected that the result of the hash function depends on the signedness of `char` (which did seem a bit dodgy to me when I looked at it, but I assumed we knew what we were doing).

For the new function the fix is obvious (as unsigned char, as dwarf5 specifies), but it's not clear to me how to handle the existing hash function, whose broken hashes are already in the wild.

@aprantl, @JDevlieghere: How should we handle this? This is only used on apple platforms, and I expect most of apple code to be compiled on x86 hosts (right?), which has char signed, so explicitly setting this to signed for the apple hash would probably have the least impact while maintaining consistency for the future. OTOH, lldb already implements this hash functions via unsigned char, and this discrepancy has gone unnoticed, so maybe it does not matter? (This effect is only visible on char >=128, so US-ASCII is fine, but any utf8 character would trigger this).


Repository:
  rL LLVM

https://reviews.llvm.org/D42740





More information about the llvm-commits mailing list