[PATCH] D23418: [analyzer] Added a reusable constraint system to the CloneDetector

Mon Mar 27 15:16:33 PDT 2017

teemperor marked 7 inline comments as done.
teemperor added a comment.

Hey Leslie,

regarding performance: Last time I checked we spend most of the time on the verification of the hash values. We can do some tricks to make this faster (like delaying the verification to the end of the constraints where we have much less clones and therefore less overhead with this)

regarding false-positives: We will do some basic stuff to reduce them soon-ish (e.g. increasing the minimum clone size default value, filtering those generated files). Afterwards we probably need some fancier way of teaching the checker what is a valid clone and what not.

If you have some actual bugs that are found by the checker, please send them to me (teemperor [AT] gmail.com) that we can add them to the regression tests :)

================
Comment at: include/clang/Analysis/CloneDetection.h:218-219
+///
+/// This class should be the base class of all constraints that are used in
+/// combination with the CloneDetector class.
+/// As constraints are specified in the form of template parameters, this class
----------------
NoQ wrote:
> While you could enforce this with `static_assert<is_base_of<CloneConstraint, T>>` in the detector, i think we shouldn't be strictly enforcing this.
> 
> Providing useful utility functions is pretty much the only purpose of this class, so it intends to be useful, but this shouldn't block users from implementing the `constrain()` method completely from scratch.
> 
> Also, we could probably move the method that groups clones by hashes here as the third utility method, if more than one hash function is eventually used(?)
Everything done here beside the hashing, because I think we need to do a few changes there in the future that need some more discussion. For example:

* Split up hashing and verifying them into different constraints. This could drastically improve performance because verification of hashes id currently the biggest time-consumer on an average code-base (at least that was the case the last time I profiled :) ). Depending on how we do this we might need to rely to a certain degree on the unique-ness property of the hash codes...

* Unify the hashing we do with the hashing that we use in the AST. I think we should wait here until we are sure that this is the right thing to do and that the code is mature enough.

================
Comment at: lib/StaticAnalyzer/Checkers/CloneChecker.cpp:81-83
+  Detector.findClones(AllCloneGroups, RecursiveCloneTypeIIConstraint(),
+                      MinComplexityConstraint(MinComplexity),
+                      MinGroupSizeConstraint(2), OnlyLargestCloneConstraint());
----------------
NoQ wrote:
> Yay. I like how it looks.
\o/

https://reviews.llvm.org/D23418