[cfe-dev] [GSoC 2016] Midterm report - Finding and reporting bugs caused by copy and paste

Raphael Isemann via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 20 12:30:54 PDT 2016

Hi everyone,

this is the midterm report of this years GSoC project "Finding and
reporting bugs caused by copy and paste"[1].

The goal of the project is to scan C++ source code for identical or
similar pieces of code to reduce code redundancy and detect bugs
caused by copy-pasting.

The way this project approaches this problem is by hashing all Stmts
in the AST and then searching the hash codes for identical values.
For performance reasons all hashes are calculated with a new AST
hashing code that only needs linear time to hash all Stmts in an AST.
Also, the hashing is done with a locality-sensitive hash function that
maps similarly structured Stmts into the same hash buckets, therefore
also enabling us to search for similar Stmts.

So far, the project produced a finished patch that adds
postorder-traversal support for the RecursiveASTVisitor [2] and a
work-in-progress patch for adding a checker implementing above
functionality [3] (see [4] for the working branch).

The checker is right now able to find similar code pieces, finding
potential errors in them and provide suggestions for fixing them (see
[5] for an example use case).

The next steps are testing the checker on real-world projects and
preparing it for merging into upstream. Merging the checker into
upstream is especially important for the project as it would pave the
way for testing the code in production environments on real code

After the checker is finished, we focus on researching cross-TU
support for the clang SA checker framework and ensuring that the
hashing-code stays in sync with the clang AST API. Also scheduled are
improving the checker with new ways for finding code clones and
investigating how other parts of clang (for example Stmt::Profile or
the IdenticalExprChecker) can benefit from this project.

All in all, the project is currently following the proposed work
items, with the exception that we work through the work items in a
order that allows a incremental development process that improves
existing infrastructure instead of starting from scratch as originally

That's everything for now. Feel free to mail me if you have questions
or suggestions!



[1] https://docs.google.com/document/d/1hY_EUIqeQ6cAYaIqrePWvcy6XmkMv5BoxrfskEv5Tl0/edit
[3] http://reviews.llvm.org/D20795
[4] https://github.com/Teemperor/clang/tree/GSoC2016
[5] https://docs.google.com/document/d/1kye8k5WbVRRon2XPkjvvBR40wMro65TJf9Iv8qAoWDM/edit?usp=sharing

More information about the cfe-dev mailing list