[llvm] [llvm] Improve implementation of StringRef::find_last_of and cie (PR #71865)

Mehdi Amini via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 12 15:18:24 PDT 2024


================
@@ -268,17 +268,70 @@ StringRef::size_type StringRef::find_first_not_of(StringRef Chars,
   return npos;
 }
 
+// See https://graphics.stanford.edu/~seander/bithacks.html#ValueInWord
+static inline uint64_t haszero(uint64_t v) {
+  return ~((((v & 0x7F7F7F7F7F7F7F7F) + 0x7F7F7F7F7F7F7F7F) | v) |
+           0x7F7F7F7F7F7F7F7F);
+}
+static inline uint64_t hasvalue(uint64_t x, char n) {
+  return haszero((x) ^ (~0UL / 255 * (n)));
+}
+
+/// This is a hot spot for some clangd operations, enough to be eligible to
+/// a pseudo - vectorized implementation.
+static StringRef::size_type
+vectorized_find_last_of_specialized(const char *Data, size_t Sz, char C0,
+                                    char C1) {
+  while (Sz >= 8) {
+    Sz -= 8;
+    uint64_t Buffer = 0;
+    std::memcpy((void *)&Buffer, (void *)(Data + Sz), sizeof(Buffer));
+    uint64_t Check = hasvalue(Buffer, C0) | hasvalue(Buffer, C1);
+    if (Check)
+      return Sz + 7 - llvm::countl_zero(Check) / 8;
+  }
+  if (Sz >= 4) {
+    Sz -= 4;
+    uint32_t Buffer = 0;
+    std::memcpy((void *)&Buffer, (void *)(Data + Sz), sizeof(Buffer));
+    uint64_t Check = hasvalue(Buffer, C0) | hasvalue(Buffer, C1);
+    if (Check)
+      return Sz + 7 - llvm::countl_zero(Check) / 8;
+  }
+  if (Sz >= 2) {
+    Sz -= 2;
+    uint16_t Buffer = 0;
+    std::memcpy((void *)&Buffer, (void *)(Data + Sz), sizeof(Buffer));
+    uint64_t Check = hasvalue(Buffer, C0) | hasvalue(Buffer, C1);
+    if (Check)
+      return Sz + 7 - llvm::countl_zero(Check) / 8;
+  }
+  if (Sz >= 1)
+    if (*Data == C0 || *Data == C1)
+      return 0;
+
+  return StringRef::npos;
+}
+
 /// find_last_of - Find the last character in the string that is in \arg C,
 /// or npos if not found.
 ///
-/// Note: O(size() + Chars.size())
+/// Note: O(size() + Chars.size()) for the generic case.
 StringRef::size_type StringRef::find_last_of(StringRef Chars,
                                              size_t From) const {
+  size_type Sz = std::min(From, Length);
+
+  if (Chars.size() == 2) {
----------------
joker-eph wrote:

It's a tradeoff between impact vs cost of maintenance.

Moving it to clangd does not reduce the maintenance aspect for the LLVM project though, are we concerned with StringRef maintenance here?
I would say that on the contrary, moving it to clangs runs the risk than Bolt or LLDB reimplement the same thing, not knowing clangd has it.
So I'd say either the implementation isn't overly complex for the long term maintenance here compared to the benefits, or it may not belong to the project?

I don't find a mention of the current perf impact on clangd though?
In https://github.com/llvm/llvm-project/pull/71865#issuecomment-1981763725 you mention a micro-benchmark @serge-sans-paille but no the result as far as I can see?

https://github.com/llvm/llvm-project/pull/71865


More information about the llvm-commits mailing list