r312222 - [analyzer] Performance optimizations for the CloneChecker

Raphael Isemann via cfe-commits cfe-commits at lists.llvm.org
Thu Aug 31 00:10:46 PDT 2017


Author: teemperor
Date: Thu Aug 31 00:10:46 2017
New Revision: 312222

URL: http://llvm.org/viewvc/llvm-project?rev=312222&view=rev
Log:
[analyzer] Performance optimizations for the CloneChecker

Summary:
This patch  aims at optimizing the CloneChecker for larger programs. Before this
patch we took around 102 seconds to analyze sqlite3 with a complexity value of
50. After this patch we now take 2.1 seconds to analyze sqlite3.

The biggest performance optimization is that we now put the constraint for group
size before the constraint for the complexity. The group size constraint is much
faster in comparison to the complexity constraint as it only does a simple
integer comparison. The complexity constraint on the other hand actually
traverses each Stmt and even checks the macro stack, so it is obviously not able
to handle larger amounts of incoming clones. The new order filters out all the
single-clone groups that the type II constraint generates in a faster way before
passing the fewer remaining clones to the complexity constraint. This reduced
runtime by around 95%.

The other change is that we also delay the verification part of the type II
clones back in the chain of constraints. This required to split up the
constraint into two parts - a verification and a hash constraint (which is also
making it more similar to the original design of the clone detection algorithm).
The reasoning for this is the same as before: The verification constraint has to
traverse many statements and shouldn't be at the start of the constraint chain.
However, as the type II hashing has to be the first step in our algorithm, we
have no other choice but split this constrain into two different ones. Now our
group size and complexity constrains filter out a chunk of the clones before
they reach the slow verification step, which reduces the runtime by around 8%.

I also kept the full type II constraint around - that now just calls it's two
sub-constraints - in case someone doesn't care about the performance benefits
of doing this.

Reviewers: NoQ

Reviewed By: NoQ

Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits

Differential Revision: https://reviews.llvm.org/D34182

Modified:
    cfe/trunk/include/clang/Analysis/CloneDetection.h
    cfe/trunk/lib/Analysis/CloneDetection.cpp
    cfe/trunk/lib/StaticAnalyzer/Checkers/CloneChecker.cpp
    cfe/trunk/unittests/Analysis/CloneDetectionTest.cpp

Modified: cfe/trunk/include/clang/Analysis/CloneDetection.h
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Analysis/CloneDetection.h?rev=312222&r1=312221&r2=312222&view=diff
==============================================================================
--- cfe/trunk/include/clang/Analysis/CloneDetection.h (original)
+++ cfe/trunk/include/clang/Analysis/CloneDetection.h Thu Aug 31 00:10:46 2017
@@ -252,22 +252,25 @@ public:
       std::function<bool(const StmtSequence &, const StmtSequence &)> Compare);
 };
 
-/// Searches all children of the given clones for type II clones (i.e. they are
-/// identical in every aspect beside the used variable names).
-class RecursiveCloneTypeIIConstraint {
-
-  /// Generates and saves a hash code for the given Stmt.
-  /// \param S The given Stmt.
-  /// \param D The Decl containing S.
-  /// \param StmtsByHash Output parameter that will contain the hash codes for
-  ///                    each StmtSequence in the given Stmt.
-  /// \return The hash code of the given Stmt.
-  ///
-  /// If the given Stmt is a CompoundStmt, this method will also generate
-  /// hashes for all possible StmtSequences in the children of this Stmt.
-  size_t saveHash(const Stmt *S, const Decl *D,
-                  std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash);
+/// This constraint moves clones into clone groups of type II via hashing.
+///
+/// Clones with different hash values are moved into separate clone groups.
+/// Collisions are possible, and this constraint does nothing to address this
+/// them. Add the slower RecursiveCloneTypeIIVerifyConstraint later in the
+/// constraint chain, not necessarily immediately, to eliminate hash collisions
+/// through a more detailed analysis.
+class RecursiveCloneTypeIIHashConstraint {
+public:
+  void constrain(std::vector<CloneDetector::CloneGroup> &Sequences);
+};
 
+/// This constraint moves clones into clone groups of type II by comparing them.
+///
+/// Clones that aren't type II clones are moved into separate clone groups.
+/// In contrast to the RecursiveCloneTypeIIHashConstraint, all clones in a clone
+/// group are guaranteed to be be type II clones of each other, but it is too
+/// slow to efficiently handle large amounts of clones.
+class RecursiveCloneTypeIIVerifyConstraint {
 public:
   void constrain(std::vector<CloneDetector::CloneGroup> &Sequences);
 };

Modified: cfe/trunk/lib/Analysis/CloneDetection.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Analysis/CloneDetection.cpp?rev=312222&r1=312221&r2=312222&view=diff
==============================================================================
--- cfe/trunk/lib/Analysis/CloneDetection.cpp (original)
+++ cfe/trunk/lib/Analysis/CloneDetection.cpp Thu Aug 31 00:10:46 2017
@@ -238,9 +238,18 @@ static size_t createHash(llvm::MD5 &Hash
   return HashCode;
 }
 
-size_t RecursiveCloneTypeIIConstraint::saveHash(
-    const Stmt *S, const Decl *D,
-    std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
+/// Generates and saves a hash code for the given Stmt.
+/// \param S The given Stmt.
+/// \param D The Decl containing S.
+/// \param StmtsByHash Output parameter that will contain the hash codes for
+///                    each StmtSequence in the given Stmt.
+/// \return The hash code of the given Stmt.
+///
+/// If the given Stmt is a CompoundStmt, this method will also generate
+/// hashes for all possible StmtSequences in the children of this Stmt.
+static size_t
+saveHash(const Stmt *S, const Decl *D,
+         std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
   llvm::MD5 Hash;
   ASTContext &Context = D->getASTContext();
 
@@ -340,7 +349,7 @@ static bool areSequencesClones(const Stm
   return DataLHS == DataRHS;
 }
 
-void RecursiveCloneTypeIIConstraint::constrain(
+void RecursiveCloneTypeIIHashConstraint::constrain(
     std::vector<CloneDetector::CloneGroup> &Sequences) {
   // FIXME: Maybe we can do this in-place and don't need this additional vector.
   std::vector<CloneDetector::CloneGroup> Result;
@@ -381,8 +390,7 @@ void RecursiveCloneTypeIIConstraint::con
 
       for (; i < StmtsByHash.size(); ++i) {
         // A different hash value means we have reached the end of the sequence.
-        if (PrototypeHash != StmtsByHash[i].first ||
-            !areSequencesClones(StmtsByHash[i].second, Current.second)) {
+        if (PrototypeHash != StmtsByHash[i].first) {
           // The current sequence could be the start of a new CloneGroup. So we
           // decrement i so that we visit it again in the outer loop.
           // Note: i can never be 0 at this point because we are just comparing
@@ -405,6 +413,14 @@ void RecursiveCloneTypeIIConstraint::con
   Sequences = Result;
 }
 
+void RecursiveCloneTypeIIVerifyConstraint::constrain(
+    std::vector<CloneDetector::CloneGroup> &Sequences) {
+  CloneConstraint::splitCloneGroups(
+      Sequences, [](const StmtSequence &A, const StmtSequence &B) {
+        return areSequencesClones(A, B);
+      });
+}
+
 size_t MinComplexityConstraint::calculateStmtComplexity(
     const StmtSequence &Seq, const std::string &ParentMacroStack) {
   if (Seq.empty())

Modified: cfe/trunk/lib/StaticAnalyzer/Checkers/CloneChecker.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/StaticAnalyzer/Checkers/CloneChecker.cpp?rev=312222&r1=312221&r2=312222&view=diff
==============================================================================
--- cfe/trunk/lib/StaticAnalyzer/Checkers/CloneChecker.cpp (original)
+++ cfe/trunk/lib/StaticAnalyzer/Checkers/CloneChecker.cpp Thu Aug 31 00:10:46 2017
@@ -81,11 +81,11 @@ void CloneChecker::checkEndOfTranslation
   // because reportSuspiciousClones() wants to search them for errors.
   std::vector<CloneDetector::CloneGroup> AllCloneGroups;
 
-  Detector.findClones(AllCloneGroups,
-                      FilenamePatternConstraint(IgnoredFilesPattern),
-                      RecursiveCloneTypeIIConstraint(),
-                      MinComplexityConstraint(MinComplexity),
-                      MinGroupSizeConstraint(2), OnlyLargestCloneConstraint());
+  Detector.findClones(
+      AllCloneGroups, FilenamePatternConstraint(IgnoredFilesPattern),
+      RecursiveCloneTypeIIHashConstraint(), MinGroupSizeConstraint(2),
+      MinComplexityConstraint(MinComplexity),
+      RecursiveCloneTypeIIVerifyConstraint(), OnlyLargestCloneConstraint());
 
   if (ReportSuspiciousClones)
     reportSuspiciousClones(BR, Mgr, AllCloneGroups);

Modified: cfe/trunk/unittests/Analysis/CloneDetectionTest.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Analysis/CloneDetectionTest.cpp?rev=312222&r1=312221&r2=312222&view=diff
==============================================================================
--- cfe/trunk/unittests/Analysis/CloneDetectionTest.cpp (original)
+++ cfe/trunk/unittests/Analysis/CloneDetectionTest.cpp Thu Aug 31 00:10:46 2017
@@ -69,8 +69,9 @@ TEST(CloneDetector, FilterFunctionsByNam
   // all statements from functions which names start with "bar".
   std::vector<CloneDetector::CloneGroup> CloneGroups;
   Detector.findClones(CloneGroups, NoBarFunctionConstraint(),
-                      RecursiveCloneTypeIIConstraint(),
+                      RecursiveCloneTypeIIHashConstraint(),
                       MinComplexityConstraint(2), MinGroupSizeConstraint(2),
+                      RecursiveCloneTypeIIVerifyConstraint(),
                       OnlyLargestCloneConstraint());
 
   ASSERT_EQ(CloneGroups.size(), 1u);
@@ -86,8 +87,9 @@ TEST(CloneDetector, FilterFunctionsByNam
   // Retry above's example without the filter...
   CloneGroups.clear();
 
-  Detector.findClones(CloneGroups, RecursiveCloneTypeIIConstraint(),
+  Detector.findClones(CloneGroups, RecursiveCloneTypeIIHashConstraint(),
                       MinComplexityConstraint(2), MinGroupSizeConstraint(2),
+                      RecursiveCloneTypeIIVerifyConstraint(),
                       OnlyLargestCloneConstraint());
   ASSERT_EQ(CloneGroups.size(), 1u);
   ASSERT_EQ(CloneGroups.front().size(), 4u);




More information about the cfe-commits mailing list