[cfe-dev] C++ algorithm analysis tool

Wed Apr 5 03:18:07 PDT 2017

Clone checker currently does discard variable names, as long as they 
follow the same pattern, eg. `a = a + b` is a clone of `c = c + d` but 
not of `a = b + b`. It also has a mode in which it warns about patterns 
that are almost-clones (broken in one variable) as copy-paste errors.

Function names matter though, so if we expect to find both SWAP and 
std::swap, we'd need two runs for now.

As soon as https://reviews.llvm.org/D23418 lands, which is right about 
now, the clone checker would also be *highly* configurable and modular, 
and all sorts of per-use-case tweaks would be possible. Fuzzy-matching 
function names is probably not impossible.

On 4/5/17 1:11 PM, Alex L wrote:
>
>
> On 5 April 2017 at 11:05, Artem Dergachev via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>     On second thought, maybe it's quite neat.
>
>     I mean, like, a person who doesn't know anything about AST and
>     about matchers and whatever, could probably do something like that
>
>     $ cat < pattern.c
>     void bubblesort(int *array, int length) {
>           for (int i = 0; i < length - 1; ++i)
>              for (int j = 0; j < length - i - 1; ++j)
>                  if (array[j] > array[j + 1]) {
>                      int tmp = array[j];
>                      array[j] = array[j + 1];
>                      array[j + 1] = tmp;
>                  }
>     }
>     ^D
>
>     and then
>
>     $ clang-finder --pattern pattern.c my_project.c
>
>     and the tool would find all clones of the pattern in the project,
>     by constructing the AST for the pattern and hashing it and looking
>     for clones. The pattern would be in a different AST context, but i
>     guess for the clone checker it shouldn't be a problem, so we
>     wouldn't need to deal with ASTImporter for that.
>
>
> Pardon my ignorance, but will the clone checker match patterns that 
> have different variable names (e.g. values instead of array and size 
> instead of length)?
>
> Also, is it possible to use patterns recursively? E.g. Say I want to 
> detect naive swap and std::swap in the bubble sort:
>
> swap1.c:
>                  int tmp = array[j];
>                  array[j] = array[j + 1];
>                  array[j + 1] = tmp;
> swap2.c:
>                  std::swap(array[j], array[j + 1]);
>
> pattern.c:
> void bubblesort(int *array, int length) {
>       for (int i = 0; i < length - 1; ++i)
>          for (int j = 0; j < length - i - 1; ++j)
>              if (array[j] > array[j + 1]) {
>                  SWAP(array, j)
>              }
> }
>
>
>
>
>     On 4/4/17 12:25 PM, Vassil Vassilev wrote:
>
>         On 04/04/17 10:59, Raphael Isemann wrote:
>
>             I feel the implementation of an algorithm in the STL and the
>             equivalent implementation written by the user are usually
>             so different
>             in terms of syntax that the current clone detection won't
>             work that
>             well. If we had a constraint for estimating semantic
>             equality, then it
>             would be a different situation, but I don't expect an
>             implementation
>             of this anytime soon :).
>
>             However, if we are satisfied with syntactic equality, then
>             this can be
>             done in a few minutes with the clone detector
>             infrastructure with
>             something like this pseudocode:
>
>                 detector.findClones(RecursiveTypeIIConstraint(),
>             MinFunctionBodyCount(1), MinComplexity(20));
>
>             This would detect code like this:
>
>             void bubblesort(int *array, int length) {
>                   for (int i = 0; i < length - 1; ++i)
>                      for (int j = 0; j < length - i - 1; ++j)
>                          if (array[j] > array[j + 1]) {
>                              int tmp = array[j];
>                              array[j] = array[j + 1];
>                              array[j + 1] = tmp;
>                          }
>             }
>
>             int main() {
>                   for (int i = 0; i < length - 1; ++i) //
>             expect-warning{You could
>             call the function 'bubblesort' instead
>                      for (int j = 0; j < length - i - 1; ++j)
>                          if (array[j] > array[j + 1]) {
>                              int tmp = array[j];
>                              array[j] = array[j + 1];
>                              array[j + 1] = tmp;
>                          }
>             }
>
>             - Raphael
>
>
>           That being said, patches are welcome. The current
>         infrastructure is friendly for implementing semantic clone
>         detection constraints (clones type III). We'd be interested to
>         help if necessary.
>
>         -- Vassil
>
>
>             2017-04-04 10:35 GMT+02:00 Artem Dergachev
>             <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>>:
>
>                 +CC CloneDetector guys.
>
>                 Hmm, the idea of making body-farms and then using
>                 CloneChecker to find
>                 clones of synthesized bodies in the actual code looks
>                 curious and funny,
>                 though i'm not immediately seeing how is it superior
>                 to ASTMatchers.
>
>
>
>                 On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>
>
>                     Hi Chris,
>
>                     To my knowledge, there isn't.
>
>                     I don't recall where I got the idea, but I gave it
>                     a try last summer
>                     trying to implement a clang-tidy check doing what
>                     you proposed. I didn't
>                     have enough time to complete it, though, and I
>                     only managed to detect one or
>                     two very simple patterns.
>
>                     After thinking about this idea for some time I
>                     found that clang-tidy might
>                     be a perfect place for that, not sure whether a
>                     separate tool would be
>                     beneficial. The task of detecting a specific
>                     pattern is very similar to what
>                     clang-tidy checks do in a wide range of tasks.
>                     Also, there'd be a separate
>                     heuristic set for each standard algorithm, which
>                     makes the partitioning into
>                     different checks (for each popular standard
>                     library algorithm) natural.
>
>                     In my opinion, such checks would be useful, I'd be
>                     interested in seeing a
>                     proof-of-concept of some sort.
>
>                     One more idea I have in mind: it might be
>                     interesting to try using
>                     CloneChecker (a check of Clang Static Analyzer) to
>                     detect similar patterns
>                     in a generic way, but I'm not sure how beneficial
>                     that would be in practice.
>                     Still, might worth a try.
>
>                     +CC Alex, he might have some thoughts about this.
>
>                     Kind regards,
>                     Kirill
>
>
>                     On 01/04/17 02:42, Christopher Di Bella via
>                     cfe-dev wrote:
>
>                         Hey everyone,
>
>                         Just wondering if there's a clang tool that
>                         can analyse an algorithm to
>                         suggest where standard algorithms can be
>                         replace handwritten algos?
>                         E.g.
>
>                         int i = 0;
>                         for (; i < v.size(); ++i)
>                             if (v[i] == expected)
>                                break;
>                         if (i != v.size())
>                             some_function(v[i]);
>
>                         Can be rewritten to
>
>                         auto i = find(v.begin(), v.end(), expected);
>                         if (i != v.end())
>                             some_function(*i);
>
>                         or in C++17:
>
>                         if (auto i = find(v.begin(), v.end(),
>                         expected); i != v.end())
>                             some_function(*i);
>
>                         If not, how difficult a task is it to write
>                         such a tool? Is there
>                         anything that one should take into special
>                         consideration while writing this
>                         tool? Do you think it would have a lot of
>                         use-cases? (I do, based on my
>                         company's code base, and code I have seen
>                         while marking assignments).
>
>                         Cheers,
>
>                         Chris
>
>
>                         _______________________________________________
>                         cfe-dev mailing list
>                         cfe-dev at lists.llvm.org
>                         <mailto:cfe-dev at lists.llvm.org>
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
>                     _______________________________________________
>                     cfe-dev mailing list
>                     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>                     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>                     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>