[PATCH] D20795: Added basic capabilities to detect source code clones.

Anna Zaks via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 11 00:18:14 PDT 2016


zaks.anna added a comment.

Hi,

Thank you for working on this!

Here are several comments from a **partial** review.

Can you talk a bit about how do you envision to generalize this to copy and paste detection, where the pasted code is only partially modified? Do you think this could be generalized in that way?


================
Comment at: include/clang/AST/CloneDetection.h:20
@@ +19,3 @@
+
+#include <map>
+#include <vector>
----------------
Please, check if there are better data structures you could be using: http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc

================
Comment at: include/clang/AST/CloneDetection.h:131
@@ +130,3 @@
+  bool operator<(const StmtSequence &Other) const {
+    return std::tie(S, StartIndex, EndIndex) <
+           std::tie(Other.S, Other.StartIndex, Other.EndIndex);
----------------
We are comparing the Stmt pointers here. Does this make results non-deterministic?


================
Comment at: include/clang/AST/CloneDetection.h:145
@@ +144,3 @@
+///
+/// This class first needs to be provided with (possibly multiple) translation
+/// units by passing them to \p AnalyzeTranslationUnitDecl .
----------------
Let's not talk about multiple TU here since much more is needed to support them. Maybe we should just say that, first, given a TU, the class generates the clone identifiers (hashes?) of statements in it (by running AnalyzeTranslationUnitDecl)?

The description of the method already states that it can be called multiple times.

================
Comment at: include/clang/AST/CloneDetection.h:161
@@ +160,3 @@
+  /// Holds the generated data for a StmtSequence
+  struct StmtData {
+    /// The generated hash value.
----------------
StmtData is too generic of a name. Maybe it could be called something like code signature, clone signature, or code hash?

================
Comment at: include/clang/AST/CloneDetection.h:166
@@ +165,3 @@
+    ///
+    /// This scalar values serves as a simple way of filtering clones
+    /// that are too small to be reported. A greater value indicates that the
----------------
serves -> serve

================
Comment at: include/clang/AST/CloneDetection.h:186
@@ +185,3 @@
+  /// \param S The given StmtSequence
+  /// \param Error Output parameter that is set to true if no data was generated
+  ///        for the given StmtSequence. This can happen if the given
----------------
Could this return an llvm Optional instead of the error code?

================
Comment at: include/clang/AST/CloneDetection.h:201
@@ +200,3 @@
+  /// \brief Stores the StmtSequence with its associated data in the search
+  ///        database of this CloneDetector.
+  ///
----------------
database -> map?

I would just say "Stores the StmtSequence with its associated data to allow future querying."

================
Comment at: include/clang/AST/CloneDetection.h:209
@@ +208,3 @@
+
+  /// \brief Searches all translation units in this CloneDetector for clones.
+  ///
----------------
Please, do not talk about "all translation units" here. Maybe say something like previously seen/stored statements/code or something like that.

================
Comment at: include/clang/AST/CloneDetection.h:217
@@ +216,3 @@
+  void findClones(std::vector<CloneGroup> &Result,
+                  unsigned MinGroupComplexity = 10);
+
----------------
Make 10 into a constant. It probably should be a checker parameter. 

================
Comment at: include/clang/StaticAnalyzer/Checkers/Checkers.td:668
@@ +667,3 @@
+
+let ParentPackage = CloneDetection in {
+
----------------
This should be an "alpha" package until it's considered to be ready for end users.

================
Comment at: lib/AST/CMakeLists.txt:10
@@ -9,3 +9,3 @@
   ASTImporter.cpp
   ASTTypeTraits.cpp
   AttrImpl.cpp
----------------
Again, I do not think this should be in the AST. Who the users of this will be? If you envision users outside of Clang Static Analyzer, maybe we could add this to lib/Analysis/.

================
Comment at: lib/AST/CloneDetection.cpp:56
@@ +55,3 @@
+StmtSequence::iterator StmtSequence::begin() const {
+  if (holdsSequence()) {
+    auto CS = cast<CompoundStmt>(S);
----------------
http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

================
Comment at: lib/AST/CloneDetection.cpp:103
@@ +102,3 @@
+/// \endcode
+class HashValue {
+  unsigned Value = 0;
----------------
This should probably be a utility instead of something specific to this checker. Do we have something similar to this in LLVM already?

================
Comment at: lib/StaticAnalyzer/Checkers/CloneChecker.cpp:45
@@ +44,3 @@
+
+  DiagnosticsEngine &DiagEngine = Mgr.getDiagnostic();
+
----------------
Is it possible to report these through BugReporter?


http://reviews.llvm.org/D20795





More information about the cfe-commits mailing list