[llvm] r247269 - [ADT] Rewrite the StringRef::find implementation to be simpler, clearer,

Thu Sep 10 08:17:03 PDT 2015

On Thu, Sep 10, 2015 at 4:17 AM, Chandler Carruth via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Author: chandlerc
> Date: Thu Sep 10 06:17:49 2015
> New Revision: 247269
>
> URL: http://llvm.org/viewvc/llvm-project?rev=247269&view=rev
> Log:
> [ADT] Rewrite the StringRef::find implementation to be simpler, clearer,
> and tremendously less reliant on the optimizer to fix things.
>
> The code is always necessarily looking for the entire length of the
> string when doing the equality tests in this find implementation, but it
> previously was needlessly re-checking the size each time among other
> annoyances.
>
> By writing this so simply an ddirectly in terms of memcmp, it also is
> about 8x faster in a debug build, which in turn makes FileCheck about 2x
> faster in 'ninja check-llvm'.

Should we deliberately build FileCheck optimized by default even in debug
builds? I think we do something like that for llvm-tblgen, maybe we could
broaden that option/flag/support?

> This saves about 8% of the time for
> FileCheck-heavy parts of the test suite like the x86 backend tests.
>
> Modified:
>     llvm/trunk/lib/Support/StringRef.cpp
>
> Modified: llvm/trunk/lib/Support/StringRef.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/StringRef.cpp?rev=247269&r1=247268&r2=247269&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Support/StringRef.cpp (original)
> +++ llvm/trunk/lib/Support/StringRef.cpp Thu Sep 10 06:17:49 2015
> @@ -140,37 +140,44 @@ std::string StringRef::upper() const {
>  /// \return - The index of the first occurrence of \arg Str, or npos if
> not
>  /// found.
>  size_t StringRef::find(StringRef Str, size_t From) const {
> +  if (From > Length)
> +    return npos;
> +
> +  const char *Needle = Str.data();
>    size_t N = Str.size();
> -  if (N > Length)
> +  if (N == 0)
> +    return From;
> +
> +  size_t Size = Length - From;
> +  if (Size < N)
>      return npos;
>
> +  const char *Start = Data + From;
> +  const char *Stop = Start + (Size - N + 1);
> +
>    // For short haystacks or unsupported needles fall back to the naive
> algorithm
> -  if (Length < 16 || N > 255 || N == 0) {
> -    for (size_t e = Length - N + 1, i = std::min(From, e); i != e; ++i)
> -      if (substr(i, N).equals(Str))
> -        return i;
> +  if (Size < 16 || N > 255) {
> +    do {
> +      if (std::memcmp(Start, Needle, N) == 0)
> +        return Start - Data;
> +      ++Start;
> +    } while (Start < Stop);
>      return npos;
>    }
>
> -  if (From >= Length)
> -    return npos;
> -
>    // Build the bad char heuristic table, with uint8_t to reduce cache
> thrashing.
>    uint8_t BadCharSkip[256];
>    std::memset(BadCharSkip, N, 256);
>    for (unsigned i = 0; i != N-1; ++i)
>      BadCharSkip[(uint8_t)Str[i]] = N-1-i;
>
> -  unsigned Len = Length-From, Pos = From;
> -  while (Len >= N) {
> -    if (substr(Pos, N).equals(Str)) // See if this is the correct
> substring.
> -      return Pos;
> +  do {
> +    if (std::memcmp(Start, Needle, N) == 0)
> +      return Start - Data;
>
>      // Otherwise skip the appropriate number of bytes.
> -    uint8_t Skip = BadCharSkip[(uint8_t)(*this)[Pos+N-1]];
> -    Len -= Skip;
> -    Pos += Skip;
> -  }
> +    Start += BadCharSkip[(uint8_t)Start[N-1]];
> +  } while (Start < Stop);
>
>    return npos;
>  }
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150910/c67d04d5/attachment.html>