[llvm] r289382 - [FileCheck] Re-implement the logic to find each check prefix in the

Sun Dec 11 17:28:52 PST 2016

Nice!

On Sun, Dec 11, 2016 at 4:49 AM, Chandler Carruth via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Author: chandlerc
> Date: Sun Dec 11 06:49:05 2016
> New Revision: 289382
>
> URL: http://llvm.org/viewvc/llvm-project?rev=289382&view=rev
> Log:
> [FileCheck] Re-implement the logic to find each check prefix in the
> check file to not be unreasonably slow in the face of multiple check
> prefixes.
>
> The previous logic would repeatedly scan potentially large portions of
> the check file looking for alternative prefixes. In the worst case this
> would scan most of the file looking for a rare prefix between every
> single occurance of a common prefix. Even if we bounded the scan, this
> would do bad things if the order of the prefixes was "unlucky" and the
> distant prefix was scanned for first.
>
> None of this is necessary. It is straightforward to build a state
> machine that recognizes the first, longest of the set of alternative
> prefixes. That is in fact exactly whan a regular expression does.
>
> This patch builds a regular expression once for the set of prefixes and
> then uses it to search incrementally for the next prefix. This requires
> some threading of state but actually makes the code dramatically
> simpler. I've also added a big comment describing the algorithm as it
> was not at all obvious to me when I started.
>
> With this patch, several previously pathological test cases in
> test/CodeGen/X86 are 5x and more faster. Overall, running all tests
> under test/CodeGen/X86 uses 10% less CPU after this, and because all the
> slowest tests were hitting this, finishes in 40% less wall time on my
> system (going from just over 5.38s to just over 3.23s) on a release
> build! This patch substantially improves the time of all 7 X86 tests
> that were in the top 20 reported by --time-tests, 5 of them are
> completely off the list and the remaining 2 are much lower. (Sadly, the
> new tests on the list include 2 new X86 ones that are slow for unrelated
> reasons, so the count stays at 4 of the top 20.)
>
> It isn't clear how much this helps debug builds in aggregate in part
> because of the noise, but it again makes mane of the slowest x86 tests
> significantly faster (10% or more improvement).
>
> Modified:
>     llvm/trunk/utils/FileCheck/FileCheck.cpp
>
> Modified: llvm/trunk/utils/FileCheck/FileCheck.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/utils/
> FileCheck/FileCheck.cpp?rev=289382&r1=289381&r2=289382&view=diff
> ============================================================
> ==================
> --- llvm/trunk/utils/FileCheck/FileCheck.cpp (original)
> +++ llvm/trunk/utils/FileCheck/FileCheck.cpp Sun Dec 11 06:49:05 2016
> @@ -744,93 +744,68 @@ static size_t SkipWord(StringRef Str, si
>    return Loc;
>  }
>
> -// Try to find the first match in buffer for any prefix. If a valid match
> is
> -// found, return that prefix and set its type and location.  If there are
> almost
> -// matches (e.g. the actual prefix string is found, but is not an actual
> check
> -// string), but no valid match, return an empty string and set the
> position to
> -// resume searching from. If no partial matches are found, return an empty
> -// string and the location will be StringRef::npos. If one prefix is a
> substring
> -// of another, the maximal match should be found. e.g. if "A" and "AA" are
> -// prefixes then AA-CHECK: should match the second one.
> -static StringRef FindFirstCandidateMatch(StringRef &Buffer,
> -                                         Check::CheckType &CheckTy,
> -                                         size_t &CheckLoc) {
> -  StringRef FirstPrefix;
> -  size_t FirstLoc = StringRef::npos;
> -  size_t SearchLoc = StringRef::npos;
> -  Check::CheckType FirstTy = Check::CheckNone;
> -
> -  CheckTy = Check::CheckNone;
> -  CheckLoc = StringRef::npos;
> -
> -  for (StringRef Prefix : CheckPrefixes) {
> -    size_t PrefixLoc = Buffer.find(Prefix);
> -
> -    if (PrefixLoc == StringRef::npos)
> -      continue;
> -
> -    // Track where we are searching for invalid prefixes that look almost
> right.
> -    // We need to only advance to the first partial match on the next
> attempt
> -    // since a partial match could be a substring of a later, valid
> prefix.
> -    // Need to skip to the end of the word, otherwise we could end up
> -    // matching a prefix in a substring later.
> -    if (PrefixLoc < SearchLoc)
> -      SearchLoc = SkipWord(Buffer, PrefixLoc);
> -
> -    // We only want to find the first match to avoid skipping some.
> -    if (PrefixLoc > FirstLoc)
> -      continue;
> -    // If one matching check-prefix is a prefix of another, choose the
> -    // longer one.
> -    if (PrefixLoc == FirstLoc && Prefix.size() < FirstPrefix.size())
> -      continue;
> -
> -    StringRef Rest = Buffer.drop_front(PrefixLoc);
> -    // Make sure we have actually found the prefix, and not a word
> containing
> -    // it. This should also prevent matching the wrong prefix when one is
> a
> -    // substring of another.
> -    if (PrefixLoc != 0 && IsPartOfWord(Buffer[PrefixLoc - 1]))
> -      FirstTy = Check::CheckNone;
> -    else
> -      FirstTy = FindCheckType(Rest, Prefix);
> -
> -    FirstLoc = PrefixLoc;
> -    FirstPrefix = Prefix;
> -  }
> -
> -  // If the first prefix is invalid, we should continue the search after
> it.
> -  if (FirstTy == Check::CheckNone) {
> -    CheckLoc = SearchLoc;
> -    return "";
> -  }
> -
> -  CheckTy = FirstTy;
> -  CheckLoc = FirstLoc;
> -  return FirstPrefix;
> -}
> -
> -static StringRef FindFirstMatchingPrefix(StringRef &Buffer,
> +/// Search the buffer for the first prefix in the prefix regular
> expression.
> +///
> +/// This searches the buffer using the provided regular expression,
> however it
> +/// enforces constraints beyond that:
> +/// 1) The found prefix must not be a suffix of something that looks like
> +///    a valid prefix.
> +/// 2) The found prefix must be followed by a valid check type suffix
> using \c
> +///    FindCheckType above.
> +///
> +/// The first match of the regular expression to satisfy these two is
> returned,
> +/// otherwise an empty StringRef is returned to indicate failure.
> +///
> +/// If this routine returns a valid prefix, it will also shrink \p Buffer
> to
> +/// start at the beginning of the returned prefix, increment \p
> LineNumber for
> +/// each new line consumed from \p Buffer, and set \p CheckTy to the type
> of
> +/// check found by examining the suffix.
> +///
> +/// If no valid prefix is found, the state of Buffer, LineNumber, and
> CheckTy
> +/// is unspecified.
> +static StringRef FindFirstMatchingPrefix(Regex &PrefixRE, StringRef
> &Buffer,
>                                           unsigned &LineNumber,
> -                                         Check::CheckType &CheckTy,
> -                                         size_t &CheckLoc) {
> -  while (!Buffer.empty()) {
> -    StringRef Prefix = FindFirstCandidateMatch(Buffer, CheckTy,
> CheckLoc);
> -    // If we found a real match, we are done.
> -    if (!Prefix.empty()) {
> -      LineNumber += Buffer.substr(0, CheckLoc).count('\n');
> -      return Prefix;
> -    }
> +                                         Check::CheckType &CheckTy) {
> +  SmallVector<StringRef, 2> Matches;
>
> -    // We didn't find any almost matches either, we are also done.
> -    if (CheckLoc == StringRef::npos)
> +  while (!Buffer.empty()) {
> +    // Find the first (longest) match using the RE.
> +    if (!PrefixRE.match(Buffer, &Matches))
> +      // No match at all, bail.
>        return StringRef();
>
> -    LineNumber += Buffer.substr(0, CheckLoc + 1).count('\n');
> +    StringRef Prefix = Matches[0];
> +    Matches.clear();
>
> -    // Advance to the last possible match we found and try again.
> -    Buffer = Buffer.drop_front(CheckLoc + 1);
> +    assert(Prefix.data() >= Buffer.data() &&
> +           Prefix.data() < Buffer.data() + Buffer.size() &&
> +           "Prefix doesn't start inside of buffer!");
> +    size_t Loc = Prefix.data() - Buffer.data();
> +    StringRef Skipped = Buffer.substr(0, Loc);
> +    Buffer = Buffer.drop_front(Loc);
> +    LineNumber += Skipped.count('\n');
> +
> +    // Check that the matched prefix isn't a suffix of some other
> check-like
> +    // word.
> +    // FIXME: This is a very ad-hoc check. it would be better handled in
> some
> +    // other way. Among other things it seems hard to distinguish between
> +    // intentional and unintentional uses of this feature.
> +    if (Skipped.empty() || !IsPartOfWord(Skipped.back())) {
> +      // Now extract the type.
> +      CheckTy = FindCheckType(Buffer, Prefix);
> +
> +      // If we've found a valid check type for this prefix, we're done.
> +      if (CheckTy != Check::CheckNone)
> +        return Prefix;
> +    }
> +
> +    // If we didn't successfully find a prefix, we need to skip this
> invalid
> +    // prefix and continue scanning. We directly skip the prefix that was
> +    // matched and any additional parts of that check-like word.
> +    Buffer = Buffer.drop_front(SkipWord(Buffer, Prefix.size()));
>    }
>
> +  // We ran out of buffer while skipping partial matches so give up.
>    return StringRef();
>  }
>
> @@ -838,7 +813,7 @@ static StringRef FindFirstMatchingPrefix
>  ///
>  /// The strings are added to the CheckStrings vector. Returns true in
> case of
>  /// an error, false otherwise.
> -static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer,
> +static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer, Regex
> &PrefixRE,
>                            std::vector<CheckString> &CheckStrings) {
>    std::vector<Pattern> ImplicitNegativeChecks;
>    for (const auto &PatternString : ImplicitCheckNot) {
> @@ -866,20 +841,20 @@ static bool ReadCheckFile(SourceMgr &SM,
>
>    while (1) {
>      Check::CheckType CheckTy;
> -    size_t PrefixLoc;
>
>      // See if a prefix occurs in the memory buffer.
> -    StringRef UsedPrefix =
> -        FindFirstMatchingPrefix(Buffer, LineNumber, CheckTy, PrefixLoc);
> +    StringRef UsedPrefix = FindFirstMatchingPrefix(PrefixRE, Buffer,
> LineNumber,
> +                                                   CheckTy);
>      if (UsedPrefix.empty())
>        break;
> -
> -    Buffer = Buffer.drop_front(PrefixLoc);
> +    assert(UsedPrefix.data() == Buffer.data() &&
> +           "Failed to move Buffer's start forward, or pointed prefix
> outside "
> +           "of the buffer!");
>
>      // Location to use for error messages.
> -    const char *UsedPrefixStart = Buffer.data() + (PrefixLoc == 0 ? 0 :
> 1);
> +    const char *UsedPrefixStart = UsedPrefix.data();
>
> -    // PrefixLoc is to the start of the prefix. Skip to the end.
> +    // Skip the buffer to the end.
>      Buffer = Buffer.drop_front(UsedPrefix.size() +
> CheckTypeSize(CheckTy));
>
>      // Complain about useful-looking but unsupported suffixes.
> @@ -1254,11 +1229,28 @@ static bool ValidateCheckPrefixes() {
>    return true;
>  }
>
> -// I don't think there's a way to specify an initial value for cl::list,
> -// so if nothing was specified, add the default
> -static void AddCheckPrefixIfNeeded() {
> +// Combines the check prefixes into a single regex so that we can
> efficiently
> +// scan for any of the set.
> +//
> +// The semantics are that the longest-match wins which matches our regex
> +// library.
> +static Regex buildCheckPrefixRegex() {
> +  // I don't think there's a way to specify an initial value for cl::list,
> +  // so if nothing was specified, add the default
>    if (CheckPrefixes.empty())
>      CheckPrefixes.push_back("CHECK");
> +
> +  // We already validated the contents of CheckPrefixes so just
> concatenate
> +  // them as alternatives.
> +  SmallString<32> PrefixRegexStr;
> +  for (StringRef Prefix : CheckPrefixes) {
> +    if (Prefix != CheckPrefixes.front())
> +      PrefixRegexStr.push_back('|');
> +
> +    PrefixRegexStr.append(Prefix);
> +  }
> +
> +  return Regex(PrefixRegexStr);
>  }
>
>  static void DumpCommandLine(int argc, char **argv) {
> @@ -1342,7 +1334,16 @@ int main(int argc, char **argv) {
>      return 2;
>    }
>
> -  AddCheckPrefixIfNeeded();
> +  Regex PrefixRE = buildCheckPrefixRegex();
> +  std::string REError;
> +  if (!PrefixRE.isValid(REError)) {
> +    errs() << "Unable to combine check-prefix strings into a prefix
> regular "
> +              "expression! This is likely a bug in FileCheck's
> verification of "
> +              "the check-prefix strings. Regular expression parsing
> failed "
> +              "with the following error: "
> +           << REError << "\n";
> +    return 2;
> +  }
>
>    SourceMgr SM;
>
> @@ -1364,7 +1365,7 @@ int main(int argc, char **argv) {
>                          SMLoc());
>
>    std::vector<CheckString> CheckStrings;
> -  if (ReadCheckFile(SM, CheckFileText, CheckStrings))
> +  if (ReadCheckFile(SM, CheckFileText, PrefixRE, CheckStrings))
>      return 2;
>
>    // Open the file to check and add it to SourceMgr.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161211/76ed9eb2/attachment-0001.html>