[PATCH] D20795: Added basic capabilities to detect source code clones.

Vassil Vassilev via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 7 02:10:54 PDT 2016


v.g.vassilev requested changes to this revision.
This revision now requires changes to proceed.

================
Comment at: include/clang/AST/CloneDetection.h:148
@@ +147,3 @@
+/// This class only searches for clones in exectuable source code
+/// (e.g. function bodies). Other clones (e.g. cloned comments or declarations)
+/// are not supported.
----------------
It might be a good idea to detect cloned comments, too.

================
Comment at: include/clang/AST/CloneDetection.h:153
@@ +152,3 @@
+/// in the given statements.
+/// This is done by hashing all statements using a locality-sensitive hash
+/// function that generates identical hash values for similar statement
----------------
Can you move this on the previous line?

================
Comment at: include/clang/AST/CloneDetection.h:166
@@ +165,3 @@
+    /// that are too small to be reported.
+    /// A greater value indicates that the related StmtSequence is probably
+    /// more interesting to the user.
----------------
This should go on prev line too. Please check all comments for this pattern.

================
Comment at: include/clang/AST/CloneDetection.h:215
@@ +214,3 @@
+  ///         StmtSequences that were identified to be clones of each other.
+  std::vector<CloneGroup> findClones(unsigned MinGroupComplexity = 10);
+
----------------
We shouldn't copy (or hope this would be std::moved). Can you pass it as a reference to the argument list?

================
Comment at: lib/AST/CloneDetection.cpp:22
@@ +21,3 @@
+
+StmtSequence::StmtSequence(CompoundStmt *Stmt, ASTContext *Context,
+                           unsigned StartIndex, unsigned EndIndex)
----------------
It is a very common pattern in clang the ASTContext to be passed as ASTContext&, this should simplify the body of the constructors.

================
Comment at: lib/AST/CloneDetection.cpp:45
@@ +44,3 @@
+  // surround the other sequence.
+  bool StartIsInBounds = Context->getSourceManager().isBeforeInTranslationUnit(
+                             getStartLoc(), Other.getStartLoc()) ||
----------------
I'd factor out the Context->getSourceManager() in a local variable.

================
Comment at: lib/AST/CloneDetection.cpp:54
@@ +53,3 @@
+
+Stmt *const *StmtSequence::begin() const {
+  if (holdsSequence()) {
----------------
I am not sure what is the intent but shouldn't that be `Stmt const* const*`?

================
Comment at: lib/AST/CloneDetection.cpp:56
@@ +55,3 @@
+  if (holdsSequence()) {
+    auto CS = static_cast<CompoundStmt *>(S);
+    return CS->body_begin() + StartIndex;
----------------
Please use `auto CS = cast<CompoundStmt>(S);`

This logic will crash on `void f() try {} catch(...){}` In this case we do not generate a CompoundStmt.

================
Comment at: lib/AST/CloneDetection.cpp:64
@@ +63,3 @@
+  if (holdsSequence()) {
+    auto CS = static_cast<CompoundStmt *>(S);
+    return CS->body_begin() + EndIndex;
----------------
Same as above.

================
Comment at: lib/AST/CloneDetection.cpp:84
@@ +83,3 @@
+  void addData(unsigned Data) {
+    Value *= 53;
+    Value += Data;
----------------
Still a floating magic constant. Could you factor it out in a variable with a meaningful name?

================
Comment at: lib/AST/CloneDetection.cpp:112
@@ +111,3 @@
+
+  // We need to traverse postorder over the AST for our algorithm
+  // to work as each parent expects that its children were already hashed.
----------------
Doxygen style /// ?

================
Comment at: lib/AST/CloneDetection.cpp:139
@@ +138,3 @@
+    // Iterate over each child in the sub-sequence.
+    for (auto I = StartIterator; I != EndIterator; ++I) {
+      Stmt *Child = *I;
----------------
Could we use a range-based for loop. This looks odd.

================
Comment at: lib/AST/CloneDetection.cpp:161
@@ +160,3 @@
+  void SaveData(StmtSequence CurrentStmt, HashValue Hash,
+                       unsigned Complexity);
+
----------------
Indent.

================
Comment at: lib/AST/CloneDetection.cpp:164
@@ +163,3 @@
+  CloneDetector &CD;
+  ASTContext &Context;
+};
----------------
Could you move these members above. I don't think this is common in the codebase.

================
Comment at: lib/AST/CloneDetection.cpp:293
@@ +292,3 @@
+  // contains another group, we only need to return the bigger group.
+  for (unsigned I = 0; I < Result.size(); ++I) {
+    for (unsigned J = 0; J < Result.size(); ++J) {
----------------
Capital letters are more common when naming iterators. What about small `i` and `j` here. And down you can use `I` and you won't need to break the line.


http://reviews.llvm.org/D20795





More information about the cfe-commits mailing list