[cfe-dev] GSoC proposal - Finding and analysing copy-pasted code with clang

Tue Apr 5 17:05:28 PDT 2016

Hi,

thanks for the hints! I'll try to get early feedback on anything I'll work on :)

- Raphael

2016-04-05 23:12 GMT+02:00 Anna Zaks <ganna at apple.com>:
> Raphael,
>
> I just noticed that you propose to store analysis results into a JASON
> database. Note that the static analyzer already has a rich format (plist)
> that is used to store reports for further processing. Also, that format is
> being serialized into a database by CodeChecker.
>
> If your goal is to integrate your project into LLVM at some point, I suggest
> sending out incremental changes for review, following the incremental
> development policy. This has a lot of benefits, for example, it will allow
> the community provide you with sufficient feedback along the away!
>
> Cheers,
> Anna.
>
> On Apr 4, 2016, at 2:28 PM, Anna Zaks <ganna at apple.com> wrote:
>
>
> On Mar 24, 2016, at 6:59 AM, Vassil Vassilev <v.g.vassilev at gmail.com> wrote:
>
> Hi all,
>  I just want to resend a few things from our internal discussion here for
> the record. I will summarize them:
>  * Raphael contacted me early (1-2 months before GSoC started) to express
> his interest in the copy-paste project. He wanted to continue the
> development of our prototype (done by Kirill in GSoC 2015).
>  * Raphael learned from Kirill and me the current state of the project and
> the current implementation deficiencies (this section of his proposal
> includes our feedback).
>  * Raphael started playing with the implementation early and had a few good
> ideas how to further improve the code-clone hashing.
>  * Raphael also proposed to try to feed back as much as possible to clang's
> mainline (something that we didn't have time to do last year).
>  * Raphael has good experience with relevant projects such as his work on
> WebAssembly.
>
>  If you are interested in more technical details please let us know.
>
>  I find this proposal very reasonable and the candidacy very strong. From
> the CV and the proposal I think the candidate can do what he are suggests
> and I'd be happy to mentor him. I'd be happy to hear Anna's comments on his
> proposal.
>
>
> Hi Vassil and Raphael,
>
> Sorry for the delay, I just got to reading your proposal. Below are some
> comments.
>
> If I understand correctly, you are proposing to:
>  1) Add another stand-alone tool + a library that performs cross-translation
> unit clone detection on AST-level.
>  2) Add a checker to the Clang Static Analyzer that performs (the same?)
> clone detection but limited to a single translation unit.
>
> How much code reuse will there be between the two? Will the stand-alone tool
> be built on top of the checker? I did not get that feeling from the
> proposal, especially, since the stand-alone tool will be completed first. It
> seems that all of the goals mentioned in the proposal except for the cross
> translation unit analysis could be done in the static analyzer. So why not
> start with that? I think it would be very beneficial for the project to have
> some clone detection committed in tree, immediately available to all of the
> existing users of the static analyzer!
>
> One of the obstacles in contributing the existing checker to the analyzer is
> issue reporting. Have you considered reporting the subsequent clones as a
> note on the first clone? The clones are related and it looks like the
> current output does not highlight that. (The static analyzer does not
> support notes right now, so you'd need to extend that functionality.)
>
> I am very apprehensive about adding yet another analysis tool to the clang
> ecosystem. Having clang-tidy, the Clang Static Analyzer + yet another tool
> would be quite confusing to the user. The most user friendly approach is to
> have a single tool that highlights all the problems the users have in their
> code. I do acknowledge that it would not be possible to make the clone
> checker scale to cross translation unit analysis since we do not currently
> have the infrastructure to support that. However, building the stand-alone
> tool on top of the checker would allow turning it into a cross-translation
> unit checker once the infrastructure is added to the static analyzer.
>
> Have you looked into CodeChecker and the new scan-build.py projects? They do
> rely on using the compilation database, which is something you plan on doing
> as well. Can you reuse scan-build.py instead of writing your own build
> interposition? The goal of CodeChecker is to collect and display static
> analysis reports generated by all clang-based tools, specifically, both
> clang-tidy and the Clang Static Analyzer are already supported. It would be
> valuable if the new stand-alone tools would be incorporated into the same
> workflow. This way the users could have a single point of entry when they
> look for bugs.
>
> CodeChecker incorporates a nice bug viewing UI. Integrating clone reporting
> into that UI would be great. However, you might need to extend/modify both
> the reporting and the UI to make it look great.
>
> What do you think?
> Anna.
>
> --Vassil
> On 23/03/16 03:37, Raphael Isemann via cfe-dev wrote:
>
> Hi everybody,
>
> just wanted to post my GSoC proposal for feedback:
>
> https://docs.google.com/document/d/1hY_EUIqeQ6cAYaIqrePWvcy6XmkMv5BoxrfskEv5Tl0/edit?usp=sharing
>
> Feel free to comment on the document if you think something should be
> improved.
> I also try to idle in the IRC channel from now on if someone prefers
> that. My nickname there is also "teemperor".
>
> Cheers,
>
> Raphael Isemann
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>