[PATCH] D80586: Refactor Suffix Tree from MachineOutliner to LibSupport

Jessica Paquette via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 5 10:36:14 PDT 2020


paquette added a comment.

Some more testcase nits.



================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:26-28
+  for (auto It = ST.begin(); It != ST.end(); It++) {
+    SubStrings.push_back(*It);
+  }
----------------
remove braces


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:29
+  }
+  ASSERT_EQ(SubStrings.size(), (unsigned)1);
+  EXPECT_EQ(SubStrings[0].Length, (unsigned)2);
----------------
static_cast?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:40
+// at indices 0 and 4 as well as the substrings {2, 3} at indices 2 and 5.
+// This test also serves as a flag for improvements to the suffix tree.  Right
+// now, the longest repeated substring from a specific index is returned, this
----------------
Put this as a FIXME, so people can find it when they grep for FIXMEs?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:65
+                  static_cast<unsigned>(3));
+      }
+    }
----------------
What happens if it's not 3 or 2? Should probably fail the test.


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:26-28
+  for (auto It = ST.begin(); It != ST.end(); It++) {
+    SubStrings.push_back(*It);
+  }
----------------
Style nit: remove braces around for?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:29-31
+  ASSERT_EQ(SubStrings.size(), (unsigned)1);
+  EXPECT_EQ(SubStrings[0].Length, (unsigned)2);
+  EXPECT_EQ(SubStrings[0].StartIndices.size(), (unsigned)2);
----------------
Replace C-style casts with `static_cast`s? (If anything, it's just easier to search for `static_cast` than a C-style cast)

Actually, now that I think about it, why not just use, `1u`?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:40-43
+// This test also serves as a flag for improvements to the suffix tree.  Right
+// now, the longest repeated substring from a specific index is returned, this
+// could be improved to return the longest repeated substring, as well as those
+// that are smaller.
----------------
I would put this portion of the comment in a FIXME:

```
// Right now .... smaller.
```

Then if anyone runs into this, they can find it documented somewhere easily. Also, people looking for something to work on can find it easily.


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:52-60
+    if (RS.Length == static_cast<unsigned>(3)) {
+      for (unsigned StartIdx : RS.StartIndices) {
+        EXPECT_EQ(RepeatedRepetitionData[StartIdx], static_cast<unsigned>(1));
+        EXPECT_EQ(RepeatedRepetitionData[StartIdx + 1],
+                  static_cast<unsigned>(2));
+        EXPECT_EQ(RepeatedRepetitionData[StartIdx + 2],
+                  static_cast<unsigned>(3));
----------------
In this case, we should probably only find repeats of length 2 and 3, correct?

Can we check that as well? We should never get a repeat of length, say, 4, or 0, or, whatever.

E.g, something like this?

```
unsigned Len;
for (SuffixTree::RepeatedSubstring &RS : SubStrings) {
  // The only strings we expect are {2, 3} and {1, 2, 3}.
  // Check that the length of the string is something we expect before checking its contents.
  Len = RS.Length;
  bool IsExpectedLen = (Len == 3 || Len == 2);
  ASSERT_TRUE(IsExpectedLen);

  // Check the contents of the string.
  if (Len == 3) {
   ...
  } else {
   ...
  }
}
```



================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:53
+    if (RS.Length == static_cast<unsigned>(3)) {
+      for (unsigned StartIdx : RS.StartIndices) {
+        EXPECT_EQ(RepeatedRepetitionData[StartIdx], static_cast<unsigned>(1));
----------------
Test that the start indices are 0 and 4?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:72
+// indices 0 and 1.
+// This could serve from the same improvements as test above
+TEST(SuffixTreeTest, TestSingleCharacterRepeat) {
----------------
Maybe a FIXME or TODO would be good here?

```
// FIXME: Add support for detecting {1, 1} and {1, 1, 1}
```



================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:83
+              RepeatedRepetitionData.size() - RS.Length);
+    for (unsigned StartIdx : SubStrings[0].StartIndices) {
+      for (unsigned Idx = StartIdx; Idx < StartIdx + SubStrings[0].Length;
----------------
Test that the start index is 0 or 1?


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:84-86
+      for (unsigned Idx = StartIdx; Idx < StartIdx + SubStrings[0].Length;
+           Idx++)
+        EXPECT_EQ(RepeatedRepetitionData[Idx], static_cast<unsigned>(1));
----------------
Might be able to use `llvm::all_of` here?

Then you'd be able to write this like:

```
ASSERT_TRUE(all_of(std::make_range(...), [ ](...){ return Elt == 1; }));
```


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:91
+
+// Tests that the SuffixTree does not return any values for Tandem Repeat.
+// That is, for repetitions one after with other, without any difference, there
----------------
Most people don't know what a tandem repeat is.

Maybe something like this?

```
// The suffix tree cannot currently find repeated substrings in strings of the form
// {1, 2, 3, 1, 2, 3}, because the two {1, 2, 3}s are adjacent ("tandem repeats")
//
// FIXME: Teach the SuffixTree to recognize these cases.
```


================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:116-118
+      for (unsigned Idx = StartIdx; Idx < StartIdx + RS.Length; Idx++) {
+        EXPECT_EQ(RepeatedRepetitionData[Idx], static_cast<unsigned>(1));
+      }
----------------
Remove braces?

Or, even better than using a for loop, try using `llvm::all_of`?

```
EXPECT_TRUE(all_of(std::make_range(...), [](...){ return Elt == 1; }))
```



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80586/new/

https://reviews.llvm.org/D80586





More information about the llvm-commits mailing list