<div dir="ltr">Nice!</div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Dec 11, 2016 at 4:49 AM, Chandler Carruth via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: chandlerc<br>

Date: Sun Dec 11 06:49:05 2016<br>

New Revision: 289382<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=289382&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=289382&view=rev</a><br>

Log:<br>

[FileCheck] Re-implement the logic to find each check prefix in the<br>

check file to not be unreasonably slow in the face of multiple check<br>

prefixes.<br>

<br>

The previous logic would repeatedly scan potentially large portions of<br>

the check file looking for alternative prefixes. In the worst case this<br>

would scan most of the file looking for a rare prefix between every<br>

single occurance of a common prefix. Even if we bounded the scan, this<br>

would do bad things if the order of the prefixes was "unlucky" and the<br>

distant prefix was scanned for first.<br>

<br>

None of this is necessary. It is straightforward to build a state<br>

machine that recognizes the first, longest of the set of alternative<br>

prefixes. That is in fact exactly whan a regular expression does.<br>

<br>

This patch builds a regular expression once for the set of prefixes and<br>

then uses it to search incrementally for the next prefix. This requires<br>

some threading of state but actually makes the code dramatically<br>

simpler. I've also added a big comment describing the algorithm as it<br>

was not at all obvious to me when I started.<br>

<br>

With this patch, several previously pathological test cases in<br>

test/CodeGen/X86 are 5x and more faster. Overall, running all tests<br>

under test/CodeGen/X86 uses 10% less CPU after this, and because all the<br>

slowest tests were hitting this, finishes in 40% less wall time on my<br>

system (going from just over 5.38s to just over 3.23s) on a release<br>

build! This patch substantially improves the time of all 7 X86 tests<br>

that were in the top 20 reported by --time-tests, 5 of them are<br>

completely off the list and the remaining 2 are much lower. (Sadly, the<br>

new tests on the list include 2 new X86 ones that are slow for unrelated<br>

reasons, so the count stays at 4 of the top 20.)<br>

<br>

It isn't clear how much this helps debug builds in aggregate in part<br>

because of the noise, but it again makes mane of the slowest x86 tests<br>

significantly faster (10% or more improvement).<br>

<br>

Modified:<br>

    llvm/trunk/utils/FileCheck/<wbr>FileCheck.cpp<br>

<br>

Modified: llvm/trunk/utils/FileCheck/<wbr>FileCheck.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/utils/FileCheck/FileCheck.cpp?rev=289382&r1=289381&r2=289382&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/utils/<wbr>FileCheck/FileCheck.cpp?rev=<wbr>289382&r1=289381&r2=289382&<wbr>view=diff</a><br>

==============================<wbr>==============================<wbr>==================<br>

--- llvm/trunk/utils/FileCheck/<wbr>FileCheck.cpp (original)<br>

+++ llvm/trunk/utils/FileCheck/<wbr>FileCheck.cpp Sun Dec 11 06:49:05 2016<br>

@@ -744,93 +744,68 @@ static size_t SkipWord(StringRef Str, si<br>

   return Loc;<br>

 }<br>

<br>

-// Try to find the first match in buffer for any prefix. If a valid match is<br>

-// found, return that prefix and set its type and location.  If there are almost<br>

-// matches (e.g. the actual prefix string is found, but is not an actual check<br>

-// string), but no valid match, return an empty string and set the position to<br>

-// resume searching from. If no partial matches are found, return an empty<br>

-// string and the location will be StringRef::npos. If one prefix is a substring<br>

-// of another, the maximal match should be found. e.g. if "A" and "AA" are<br>

-// prefixes then AA-CHECK: should match the second one.<br>

-static StringRef FindFirstCandidateMatch(<wbr>StringRef &Buffer,<br>

-                                         Check::CheckType &CheckTy,<br>

-                                         size_t &CheckLoc) {<br>

-  StringRef FirstPrefix;<br>

-  size_t FirstLoc = StringRef::npos;<br>

-  size_t SearchLoc = StringRef::npos;<br>

-  Check::CheckType FirstTy = Check::CheckNone;<br>

-<br>

-  CheckTy = Check::CheckNone;<br>

-  CheckLoc = StringRef::npos;<br>

-<br>

-  for (StringRef Prefix : CheckPrefixes) {<br>

-    size_t PrefixLoc = Buffer.find(Prefix);<br>

-<br>

-    if (PrefixLoc == StringRef::npos)<br>

-      continue;<br>

-<br>

-    // Track where we are searching for invalid prefixes that look almost right.<br>

-    // We need to only advance to the first partial match on the next attempt<br>

-    // since a partial match could be a substring of a later, valid prefix.<br>

-    // Need to skip to the end of the word, otherwise we could end up<br>

-    // matching a prefix in a substring later.<br>

-    if (PrefixLoc < SearchLoc)<br>

-      SearchLoc = SkipWord(Buffer, PrefixLoc);<br>

-<br>

-    // We only want to find the first match to avoid skipping some.<br>

-    if (PrefixLoc > FirstLoc)<br>

-      continue;<br>

-    // If one matching check-prefix is a prefix of another, choose the<br>

-    // longer one.<br>

-    if (PrefixLoc == FirstLoc && Prefix.size() < FirstPrefix.size())<br>

-      continue;<br>

-<br>

-    StringRef Rest = Buffer.drop_front(PrefixLoc);<br>

-    // Make sure we have actually found the prefix, and not a word containing<br>

-    // it. This should also prevent matching the wrong prefix when one is a<br>

-    // substring of another.<br>

-    if (PrefixLoc != 0 && IsPartOfWord(Buffer[PrefixLoc - 1]))<br>

-      FirstTy = Check::CheckNone;<br>

-    else<br>

-      FirstTy = FindCheckType(Rest, Prefix);<br>

-<br>

-    FirstLoc = PrefixLoc;<br>

-    FirstPrefix = Prefix;<br>

-  }<br>

-<br>

-  // If the first prefix is invalid, we should continue the search after it.<br>

-  if (FirstTy == Check::CheckNone) {<br>

-    CheckLoc = SearchLoc;<br>

-    return "";<br>

-  }<br>

-<br>

-  CheckTy = FirstTy;<br>

-  CheckLoc = FirstLoc;<br>

-  return FirstPrefix;<br>

-}<br>

-<br>

-static StringRef FindFirstMatchingPrefix(<wbr>StringRef &Buffer,<br>

+/// Search the buffer for the first prefix in the prefix regular expression.<br>

+///<br>

+/// This searches the buffer using the provided regular expression, however it<br>

+/// enforces constraints beyond that:<br>

+/// 1) The found prefix must not be a suffix of something that looks like<br>

+///    a valid prefix.<br>

+/// 2) The found prefix must be followed by a valid check type suffix using \c<br>

+///    FindCheckType above.<br>

+///<br>

+/// The first match of the regular expression to satisfy these two is returned,<br>

+/// otherwise an empty StringRef is returned to indicate failure.<br>

+///<br>

+/// If this routine returns a valid prefix, it will also shrink \p Buffer to<br>

+/// start at the beginning of the returned prefix, increment \p LineNumber for<br>

+/// each new line consumed from \p Buffer, and set \p CheckTy to the type of<br>

+/// check found by examining the suffix.<br>

+///<br>

+/// If no valid prefix is found, the state of Buffer, LineNumber, and CheckTy<br>

+/// is unspecified.<br>

+static StringRef FindFirstMatchingPrefix(Regex &PrefixRE, StringRef &Buffer,<br>

                                          unsigned &LineNumber,<br>

-                                         Check::CheckType &CheckTy,<br>

-                                         size_t &CheckLoc) {<br>

-  while (!Buffer.empty()) {<br>

-    StringRef Prefix = FindFirstCandidateMatch(<wbr>Buffer, CheckTy, CheckLoc);<br>

-    // If we found a real match, we are done.<br>

-    if (!Prefix.empty()) {<br>

-      LineNumber += Buffer.substr(0, CheckLoc).count('\n');<br>

-      return Prefix;<br>

-    }<br>

+                                         Check::CheckType &CheckTy) {<br>

+  SmallVector<StringRef, 2> Matches;<br>

<br>

-    // We didn't find any almost matches either, we are also done.<br>

-    if (CheckLoc == StringRef::npos)<br>

+  while (!Buffer.empty()) {<br>

+    // Find the first (longest) match using the RE.<br>

+    if (!PrefixRE.match(Buffer, &Matches))<br>

+      // No match at all, bail.<br>

       return StringRef();<br>

<br>

-    LineNumber += Buffer.substr(0, CheckLoc + 1).count('\n');<br>

+    StringRef Prefix = Matches[0];<br>

+    Matches.clear();<br>

<br>

-    // Advance to the last possible match we found and try again.<br>

-    Buffer = Buffer.drop_front(CheckLoc + 1);<br>

+    assert(Prefix.data() >= Buffer.data() &&<br>

+           Prefix.data() < Buffer.data() + Buffer.size() &&<br>

+           "Prefix doesn't start inside of buffer!");<br>

+    size_t Loc = Prefix.data() - Buffer.data();<br>

+    StringRef Skipped = Buffer.substr(0, Loc);<br>

+    Buffer = Buffer.drop_front(Loc);<br>

+    LineNumber += Skipped.count('\n');<br>

+<br>

+    // Check that the matched prefix isn't a suffix of some other check-like<br>

+    // word.<br>

+    // FIXME: This is a very ad-hoc check. it would be better handled in some<br>

+    // other way. Among other things it seems hard to distinguish between<br>

+    // intentional and unintentional uses of this feature.<br>

+    if (Skipped.empty() || !IsPartOfWord(Skipped.back())) {<br>

+      // Now extract the type.<br>

+      CheckTy = FindCheckType(Buffer, Prefix);<br>

+<br>

+      // If we've found a valid check type for this prefix, we're done.<br>

+      if (CheckTy != Check::CheckNone)<br>

+        return Prefix;<br>

+    }<br>

+<br>

+    // If we didn't successfully find a prefix, we need to skip this invalid<br>

+    // prefix and continue scanning. We directly skip the prefix that was<br>

+    // matched and any additional parts of that check-like word.<br>

+    Buffer = Buffer.drop_front(SkipWord(<wbr>Buffer, Prefix.size()));<br>

   }<br>

<br>

+  // We ran out of buffer while skipping partial matches so give up.<br>

   return StringRef();<br>

 }<br>

<br>

@@ -838,7 +813,7 @@ static StringRef FindFirstMatchingPrefix<br>

 ///<br>

 /// The strings are added to the CheckStrings vector. Returns true in case of<br>

 /// an error, false otherwise.<br>

-static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer,<br>

+static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer, Regex &PrefixRE,<br>

                           std::vector<CheckString> &CheckStrings) {<br>

   std::vector<Pattern> ImplicitNegativeChecks;<br>

   for (const auto &PatternString : ImplicitCheckNot) {<br>

@@ -866,20 +841,20 @@ static bool ReadCheckFile(SourceMgr &SM,<br>

<br>

   while (1) {<br>

     Check::CheckType CheckTy;<br>

-    size_t PrefixLoc;<br>

<br>

     // See if a prefix occurs in the memory buffer.<br>

-    StringRef UsedPrefix =<br>

-        FindFirstMatchingPrefix(<wbr>Buffer, LineNumber, CheckTy, PrefixLoc);<br>

+    StringRef UsedPrefix = FindFirstMatchingPrefix(<wbr>PrefixRE, Buffer, LineNumber,<br>

+                                                   CheckTy);<br>

     if (UsedPrefix.empty())<br>

       break;<br>

-<br>

-    Buffer = Buffer.drop_front(PrefixLoc);<br>

+    assert(UsedPrefix.data() == Buffer.data() &&<br>

+           "Failed to move Buffer's start forward, or pointed prefix outside "<br>

+           "of the buffer!");<br>

<br>

     // Location to use for error messages.<br>

-    const char *UsedPrefixStart = Buffer.data() + (PrefixLoc == 0 ? 0 : 1);<br>

+    const char *UsedPrefixStart = UsedPrefix.data();<br>

<br>

-    // PrefixLoc is to the start of the prefix. Skip to the end.<br>

+    // Skip the buffer to the end.<br>

     Buffer = Buffer.drop_front(UsedPrefix.<wbr>size() + CheckTypeSize(CheckTy));<br>

<br>

     // Complain about useful-looking but unsupported suffixes.<br>

@@ -1254,11 +1229,28 @@ static bool ValidateCheckPrefixes() {<br>

   return true;<br>

 }<br>

<br>

-// I don't think there's a way to specify an initial value for cl::list,<br>

-// so if nothing was specified, add the default<br>

-static void AddCheckPrefixIfNeeded() {<br>

+// Combines the check prefixes into a single regex so that we can efficiently<br>

+// scan for any of the set.<br>

+//<br>

+// The semantics are that the longest-match wins which matches our regex<br>

+// library.<br>

+static Regex buildCheckPrefixRegex() {<br>

+  // I don't think there's a way to specify an initial value for cl::list,<br>

+  // so if nothing was specified, add the default<br>

   if (CheckPrefixes.empty())<br>

     CheckPrefixes.push_back("<wbr>CHECK");<br>

+<br>

+  // We already validated the contents of CheckPrefixes so just concatenate<br>

+  // them as alternatives.<br>

+  SmallString<32> PrefixRegexStr;<br>

+  for (StringRef Prefix : CheckPrefixes) {<br>

+    if (Prefix != CheckPrefixes.front())<br>

+      PrefixRegexStr.push_back('|');<br>

+<br>

+    PrefixRegexStr.append(Prefix);<br>

+  }<br>

+<br>

+  return Regex(PrefixRegexStr);<br>

 }<br>

<br>

 static void DumpCommandLine(int argc, char **argv) {<br>

@@ -1342,7 +1334,16 @@ int main(int argc, char **argv) {<br>

     return 2;<br>

   }<br>

<br>

-  AddCheckPrefixIfNeeded();<br>

+  Regex PrefixRE = buildCheckPrefixRegex();<br>

+  std::string REError;<br>

+  if (!PrefixRE.isValid(REError)) {<br>

+    errs() << "Unable to combine check-prefix strings into a prefix regular "<br>

+              "expression! This is likely a bug in FileCheck's verification of "<br>

+              "the check-prefix strings. Regular expression parsing failed "<br>

+              "with the following error: "<br>

+           << REError << "\n";<br>

+    return 2;<br>

+  }<br>

<br>

   SourceMgr SM;<br>

<br>

@@ -1364,7 +1365,7 @@ int main(int argc, char **argv) {<br>

                         SMLoc());<br>

<br>

   std::vector<CheckString> CheckStrings;<br>

-  if (ReadCheckFile(SM, CheckFileText, CheckStrings))<br>

+  if (ReadCheckFile(SM, CheckFileText, PrefixRE, CheckStrings))<br>

     return 2;<br>

<br>

   // Open the file to check and add it to SourceMgr.<br>

<br>

<br>

______________________________<wbr>_________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div>