[PATCH] D80586: Refactor Suffix Tree from MachineOutliner to LibSupport
Jessica Paquette via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 5 10:36:14 PDT 2020
paquette added a comment.
Some more testcase nits.
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:26-28
+ for (auto It = ST.begin(); It != ST.end(); It++) {
+ SubStrings.push_back(*It);
+ }
----------------
remove braces
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:29
+ }
+ ASSERT_EQ(SubStrings.size(), (unsigned)1);
+ EXPECT_EQ(SubStrings[0].Length, (unsigned)2);
----------------
static_cast?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:40
+// at indices 0 and 4 as well as the substrings {2, 3} at indices 2 and 5.
+// This test also serves as a flag for improvements to the suffix tree. Right
+// now, the longest repeated substring from a specific index is returned, this
----------------
Put this as a FIXME, so people can find it when they grep for FIXMEs?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:65
+ static_cast<unsigned>(3));
+ }
+ }
----------------
What happens if it's not 3 or 2? Should probably fail the test.
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:26-28
+ for (auto It = ST.begin(); It != ST.end(); It++) {
+ SubStrings.push_back(*It);
+ }
----------------
Style nit: remove braces around for?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:29-31
+ ASSERT_EQ(SubStrings.size(), (unsigned)1);
+ EXPECT_EQ(SubStrings[0].Length, (unsigned)2);
+ EXPECT_EQ(SubStrings[0].StartIndices.size(), (unsigned)2);
----------------
Replace C-style casts with `static_cast`s? (If anything, it's just easier to search for `static_cast` than a C-style cast)
Actually, now that I think about it, why not just use, `1u`?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:40-43
+// This test also serves as a flag for improvements to the suffix tree. Right
+// now, the longest repeated substring from a specific index is returned, this
+// could be improved to return the longest repeated substring, as well as those
+// that are smaller.
----------------
I would put this portion of the comment in a FIXME:
```
// Right now .... smaller.
```
Then if anyone runs into this, they can find it documented somewhere easily. Also, people looking for something to work on can find it easily.
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:52-60
+ if (RS.Length == static_cast<unsigned>(3)) {
+ for (unsigned StartIdx : RS.StartIndices) {
+ EXPECT_EQ(RepeatedRepetitionData[StartIdx], static_cast<unsigned>(1));
+ EXPECT_EQ(RepeatedRepetitionData[StartIdx + 1],
+ static_cast<unsigned>(2));
+ EXPECT_EQ(RepeatedRepetitionData[StartIdx + 2],
+ static_cast<unsigned>(3));
----------------
In this case, we should probably only find repeats of length 2 and 3, correct?
Can we check that as well? We should never get a repeat of length, say, 4, or 0, or, whatever.
E.g, something like this?
```
unsigned Len;
for (SuffixTree::RepeatedSubstring &RS : SubStrings) {
// The only strings we expect are {2, 3} and {1, 2, 3}.
// Check that the length of the string is something we expect before checking its contents.
Len = RS.Length;
bool IsExpectedLen = (Len == 3 || Len == 2);
ASSERT_TRUE(IsExpectedLen);
// Check the contents of the string.
if (Len == 3) {
...
} else {
...
}
}
```
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:53
+ if (RS.Length == static_cast<unsigned>(3)) {
+ for (unsigned StartIdx : RS.StartIndices) {
+ EXPECT_EQ(RepeatedRepetitionData[StartIdx], static_cast<unsigned>(1));
----------------
Test that the start indices are 0 and 4?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:72
+// indices 0 and 1.
+// This could serve from the same improvements as test above
+TEST(SuffixTreeTest, TestSingleCharacterRepeat) {
----------------
Maybe a FIXME or TODO would be good here?
```
// FIXME: Add support for detecting {1, 1} and {1, 1, 1}
```
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:83
+ RepeatedRepetitionData.size() - RS.Length);
+ for (unsigned StartIdx : SubStrings[0].StartIndices) {
+ for (unsigned Idx = StartIdx; Idx < StartIdx + SubStrings[0].Length;
----------------
Test that the start index is 0 or 1?
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:84-86
+ for (unsigned Idx = StartIdx; Idx < StartIdx + SubStrings[0].Length;
+ Idx++)
+ EXPECT_EQ(RepeatedRepetitionData[Idx], static_cast<unsigned>(1));
----------------
Might be able to use `llvm::all_of` here?
Then you'd be able to write this like:
```
ASSERT_TRUE(all_of(std::make_range(...), [ ](...){ return Elt == 1; }));
```
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:91
+
+// Tests that the SuffixTree does not return any values for Tandem Repeat.
+// That is, for repetitions one after with other, without any difference, there
----------------
Most people don't know what a tandem repeat is.
Maybe something like this?
```
// The suffix tree cannot currently find repeated substrings in strings of the form
// {1, 2, 3, 1, 2, 3}, because the two {1, 2, 3}s are adjacent ("tandem repeats")
//
// FIXME: Teach the SuffixTree to recognize these cases.
```
================
Comment at: llvm/unittests/Support/SuffixTreeTest.cpp:116-118
+ for (unsigned Idx = StartIdx; Idx < StartIdx + RS.Length; Idx++) {
+ EXPECT_EQ(RepeatedRepetitionData[Idx], static_cast<unsigned>(1));
+ }
----------------
Remove braces?
Or, even better than using a for loop, try using `llvm::all_of`?
```
EXPECT_TRUE(all_of(std::make_range(...), [](...){ return Elt == 1; }))
```
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80586/new/
https://reviews.llvm.org/D80586
More information about the llvm-commits
mailing list