[PATCH] D20795: Added basic capabilities to detect source code clones.

Artem Dergachev via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 11 10:32:45 PDT 2016


NoQ added a comment.

Your code is well-written and easy to understand, and makes me want to use it on my code! Added some minor comments, and there seems to be a small logic error in the compound statement hasher.

Not sure if that was already explained in detail, but i seem to agree that the only point for this project to integrate into Clang Static Analyzer is to integrate with the `BugReporter` facility - at least optionally: it would allow the reports of the clone detector checker to be gathered and viewed in a manner similar to other checkers - i.e. through scan-build/scan-view, probably CodeChecker and other tools. That should make the check easier to run and attract more users. However, the `BugReporter` mechanism is tweaked for the analyzer's path-sensitive checkers, so it'd need to be modified to suit your purpose, so in my opinion this is not critical for the initial commit.


================
Comment at: lib/AST/CloneDetection.cpp:52
@@ +51,3 @@
+      Other.getEndLoc() == getEndLoc();
+  return StartIsInBounds && EndIsInBounds;
+}
----------------
Perhaps we could early-return when we see that `StartIsInBounds` is false (for performance, because `isBeforeInTranslationUnit` looks heavy).

================
Comment at: lib/AST/CloneDetection.cpp:88
@@ +87,3 @@
+/// It is defined as:
+///   H(D) = PrimeFactor^(N-1) * D[0] + PrimeFactor^(N-2) * D[1] + ... + D[N]
+/// where
----------------
It seems that the last index is off-by-one, i think you meant something like:

```
H(D) = PrimeFactor^(N-1) * D[0] + PrimeFactor^(N-2) * D[1] + ... + D[N-1]
```

================
Comment at: lib/AST/CloneDetection.cpp:175
@@ +174,3 @@
+      ++StartIterator;
+    auto EndIterator = Iterator;
+    for (unsigned I = 0; I < Length; ++I)
----------------
Shouldn't it say something like "`EndIterator = StartIterator`"? Because when `StartPos` is non-zero, in your code we get [`StartPos`, `Length`) instead of [`StartPos`, `StartPos` + `Length`). In your code, `Length` acts as some kind of `EndPos` instead.

================
Comment at: lib/AST/CloneDetection.cpp:194
@@ +193,3 @@
+      } else {
+        StmtSequence ChildSequence(StmtSequence(Child, Context));
+
----------------
Simplifies to:

```
StmtSequence ChildSequence(Child, Context);
```

================
Comment at: lib/AST/CloneDetection.cpp:238
@@ +237,3 @@
+  // the CompoundStmt.
+  auto CS = dyn_cast<CompoundStmt>(CurrentStmt.front());
+  if (!CS)
----------------
I think that the code that deals with //computing// data for sections of compound statements (as opposed to //saving// this data) should be moved to `VisitStmt()`. Or, even better, to `VisitCompoundStmt()` - that's the whole point of having a visitor, after all. Upon expanding the complexity of the hash, you'd probably provide more special cases for special statement classes, which would all sit in their own Visit method.

That's for the future though, not really critical.

================
Comment at: lib/AST/CloneDetection.cpp:262
@@ +261,3 @@
+
+      CloneDetector::StmtData SubData(SubHash.getValue(), SubComplexity);
+      CD.add(Sequence, SubData);
----------------
This code is duplicated from the beginning of the function, which synergizes with my point above: if `SaveData()` contained only that much, then it could have been re-used on both call sites.

================
Comment at: lib/StaticAnalyzer/Checkers/CloneChecker.cpp:40
@@ +39,3 @@
+  CloneDetector.AnalyzeTranslationUnitDecl(
+      TU->getASTContext().getTranslationUnitDecl());
+
----------------
`TU->getASTContext().getTranslationUnitDecl()` is always equal to `TU` - there should, in most cases, be only one TranslationUnitDecl object around, but even if there'd be more some day, i think this invariant would still hold. 

================
Comment at: lib/StaticAnalyzer/Checkers/CloneChecker.cpp:43
@@ +42,3 @@
+  std::vector<CloneDetector::CloneGroup> CloneGroups;
+  CloneDetector.findClones(CloneGroups);
+
----------------
An idea: because this function optionally accepts `MinGroupComplexity`, you may probably want to expose this feature as a //checker-specific option// - see `MallocChecker::IsOptimistic` as an example, shouldn't be hard.

================
Comment at: lib/StaticAnalyzer/Checkers/CloneChecker.cpp:45
@@ +44,3 @@
+
+  DiagnosticsEngine &DiagEngine = Mgr.getDiagnostic();
+
----------------
zaks.anna wrote:
> Is it possible to report these through BugReporter?
I think that'd require changes in the BugReporter to allow easily adding extra pieces to path-insensitive reports (which is quite a wanted feature in many AST-based checkers - assuming we want more AST-based checkers to be moved into the analyzer specifically for better reporting, integration with scan-build, and stuff like that).

================
Comment at: test/Analysis/copypaste/test-min-max.cpp:39
@@ +38,3 @@
+
+int foo(int a, int b) {
+  return a + b;
----------------
Perhaps add a `// no-warning` comment to spots that should not cause a warning? This comment has no effect on the test suite, just a traditional way to notify the reader.


http://reviews.llvm.org/D20795





More information about the cfe-commits mailing list