[PATCH] D61178: caseFoldingDjbHash: simplify and make the US-ASCII fast path faster

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 26 02:49:34 PDT 2019


MaskRay created this revision.
MaskRay added reviewers: labath, JDevlieghere, aprantl, probinson, dblaikie.
Herald added subscribers: llvm-commits, kristina.
Herald added a project: LLVM.

The slow path (with at least one non US-ASCII) will be slower but that
doesn't matter.


Repository:
  rL LLVM

https://reviews.llvm.org/D61178

Files:
  lib/Support/DJB.cpp


Index: lib/Support/DJB.cpp
===================================================================
--- lib/Support/DJB.cpp
+++ lib/Support/DJB.cpp
@@ -57,29 +57,22 @@
   return sys::unicode::foldCharSimple(C);
 }
 
-static uint32_t caseFoldingDjbHashCharSlow(StringRef &Buffer, uint32_t H) {
-  UTF32 C = chopOneUTF32(Buffer);
-
-  C = foldCharDwarf(C);
-
-  std::array<UTF8, UNI_MAX_UTF8_BYTES_PER_CODE_POINT> Storage;
-  StringRef Folded = toUTF8(C, Storage);
-  return djbHash(Folded, H);
-}
-
 uint32_t llvm::caseFoldingDjbHash(StringRef Buffer, uint32_t H) {
+  uint32_t SavedH = H;
+  bool ASCII = true;
+  for (unsigned char C: Buffer) {
+    H = H * 33 + ('A' <= C && C <= 'Z' ? C - 'A' + 'a' : C);
+    ASCII &= C <= 0x7f;
+  }
+  if (ASCII)
+    return H;
+
+  std::array<UTF8, UNI_MAX_UTF8_BYTES_PER_CODE_POINT> Storage;
+  H = SavedH;
   while (!Buffer.empty()) {
-    unsigned char C = Buffer.front();
-    if (LLVM_LIKELY(C <= 0x7f)) {
-      // US-ASCII, encoded as one character in utf-8.
-      // This is by far the most common case, so handle this specially.
-      if (C >= 'A' && C <= 'Z')
-        C = 'a' + (C - 'A'); // fold uppercase into lowercase
-      H = (H << 5) + H + C;
-      Buffer = Buffer.drop_front();
-      continue;
-    }
-    H = caseFoldingDjbHashCharSlow(Buffer, H);
+    UTF32 C = foldCharDwarf(chopOneUTF32(Buffer));
+    StringRef Folded = toUTF8(C, Storage);
+    H = djbHash(Folded, H);
   }
   return H;
 }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D61178.196821.patch
Type: text/x-patch
Size: 1454 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190426/2de7efde/attachment.bin>


More information about the llvm-commits mailing list