[llvm-commits] [llvm] r142061 - in /llvm/trunk: lib/Support/StringRef.cpp unittests/ADT/StringRefTest.cpp
Benjamin Kramer
benny.kra at googlemail.com
Sat Oct 15 03:08:31 PDT 2011
Author: d0k
Date: Sat Oct 15 05:08:31 2011
New Revision: 142061
URL: http://llvm.org/viewvc/llvm-project?rev=142061&view=rev
Log:
Add a bad char heuristic to StringRef::find.
Based on Horspool's simplified version of Boyer-Moore. We use a constant-sized table of
uint8_ts to keep cache thrashing low, needles bigger than 255 bytes are uncommon anyways.
The worst case is still O(n*m) but we do a lot better on the average case now.
Modified:
llvm/trunk/lib/Support/StringRef.cpp
llvm/trunk/unittests/ADT/StringRefTest.cpp
Modified: llvm/trunk/lib/Support/StringRef.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/StringRef.cpp?rev=142061&r1=142060&r2=142061&view=diff
==============================================================================
--- llvm/trunk/lib/Support/StringRef.cpp (original)
+++ llvm/trunk/lib/Support/StringRef.cpp Sat Oct 15 05:08:31 2011
@@ -144,9 +144,32 @@
size_t N = Str.size();
if (N > Length)
return npos;
- for (size_t e = Length - N + 1, i = min(From, e); i != e; ++i)
- if (substr(i, N).equals(Str))
- return i;
+
+ // For short haystacks or unsupported needles fall back to the naive algorithm
+ if (Length < 16 || N > 255 || N == 0) {
+ for (size_t e = Length - N + 1, i = min(From, e); i != e; ++i)
+ if (substr(i, N).equals(Str))
+ return i;
+ return npos;
+ }
+
+ // Build the bad char heuristic table, with uint8_t to reduce cache thrashing.
+ uint8_t BadCharSkip[256];
+ std::memset(BadCharSkip, N, 256);
+ for (unsigned i = 0; i != N-1; ++i)
+ BadCharSkip[(uint8_t)Str[i]] = N-1-i;
+
+ unsigned Len = Length, Pos = min(From, Length);
+ while (Len >= N) {
+ if (substr(Pos, N).equals(Str)) // See if this is the correct substring.
+ return Pos;
+
+ // Otherwise skip the appropriate number of bytes.
+ uint8_t Skip = BadCharSkip[(uint8_t)Data[Pos+N-1]];
+ Len -= Skip;
+ Pos += Skip;
+ }
+
return npos;
}
Modified: llvm/trunk/unittests/ADT/StringRefTest.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/unittests/ADT/StringRefTest.cpp?rev=142061&r1=142060&r2=142061&view=diff
==============================================================================
--- llvm/trunk/unittests/ADT/StringRefTest.cpp (original)
+++ llvm/trunk/unittests/ADT/StringRefTest.cpp Sat Oct 15 05:08:31 2011
@@ -245,6 +245,12 @@
EXPECT_EQ(StringRef::npos, Str.find("zz"));
EXPECT_EQ(2U, Str.find("ll", 2));
EXPECT_EQ(StringRef::npos, Str.find("ll", 3));
+ EXPECT_EQ(0U, Str.find(""));
+ StringRef LongStr("hellx xello hell ello world foo bar hello");
+ EXPECT_EQ(36U, LongStr.find("hello"));
+ EXPECT_EQ(28U, LongStr.find("foo"));
+ EXPECT_EQ(12U, LongStr.find("hell", 2));
+ EXPECT_EQ(0U, LongStr.find(""));
EXPECT_EQ(3U, Str.rfind('l'));
EXPECT_EQ(StringRef::npos, Str.rfind('z'));
More information about the llvm-commits
mailing list