[PATCH] D42740: Implement a case-folding version of DJB hash

Thu Feb 8 10:47:17 PST 2018

joerg added inline comments.

================
Comment at: lib/Support/UnicodeCaseFold.cpp:26
+    return C + 32;
+  // 24 characters
+  if (C >= 0x0100 && C <= 0x012e)
----------------
labath wrote:
> joerg wrote:
> > Given that this should be applied to symbol names a lot, I would explicitly make the ASCII range fully covered to avoid all the other branches from getting triggered.
> There's a fast path in `caseFoldingDjbHash` for 7-bit ascii which should cover all reasonable symbol names. That will short circuit the case folding before even this function gets invoked (skipping all utf8 decode-recode logic, etc). Is that what you were looking for?
Partially. I wonder if it wouldn't be better in general to restructure the output to have full distinct ranges. I don't think the performance will be very nice specifically for non-upper case letters otherwise. It would also be nice if the code generator didn't do obviously stupid output like the C % 1 == 0 above. Just in terms of branches when doing simple codegen, the following is no worse:
```
   if (C < 0x0041) ...
   if (C <= 0x005a) ...
   if (C <= 0xc0) ...
```
and folding the parts of the switch into the appropiate ranges.

Repository:
  rL LLVM

https://reviews.llvm.org/D42740