[cfe-dev] GSoC proposal - Finding and analysing copy-pasted code with clang

Anna Zaks via cfe-dev cfe-dev at lists.llvm.org
Tue Apr 5 14:12:12 PDT 2016


I just noticed that you propose to store analysis results into a JASON database. Note that the static analyzer already has a rich format (plist) that is used to store reports for further processing. Also, that format is being serialized into a database by CodeChecker.

If your goal is to integrate your project into LLVM at some point, I suggest sending out incremental changes for review, following the incremental development policy. This has a lot of benefits, for example, it will allow the community provide you with sufficient feedback along the away!


> On Apr 4, 2016, at 2:28 PM, Anna Zaks <ganna at apple.com> wrote:
>> On Mar 24, 2016, at 6:59 AM, Vassil Vassilev <v.g.vassilev at gmail.com <mailto:v.g.vassilev at gmail.com>> wrote:
>> Hi all,
>>  I just want to resend a few things from our internal discussion here for the record. I will summarize them:
>>  * Raphael contacted me early (1-2 months before GSoC started) to express his interest in the copy-paste project. He wanted to continue the development of our prototype (done by Kirill in GSoC 2015).
>>  * Raphael learned from Kirill and me the current state of the project and the current implementation deficiencies (this section of his proposal includes our feedback).
>>  * Raphael started playing with the implementation early and had a few good ideas how to further improve the code-clone hashing.
>>  * Raphael also proposed to try to feed back as much as possible to clang's mainline (something that we didn't have time to do last year).
>>  * Raphael has good experience with relevant projects such as his work on WebAssembly.
>>  If you are interested in more technical details please let us know.
>>  I find this proposal very reasonable and the candidacy very strong. From the CV and the proposal I think the candidate can do what he are suggests and I'd be happy to mentor him. I'd be happy to hear Anna's comments on his proposal.
> Hi Vassil and Raphael,
> Sorry for the delay, I just got to reading your proposal. Below are some comments.
> If I understand correctly, you are proposing to:
>  1) Add another stand-alone tool + a library that performs cross-translation unit clone detection on AST-level.
>  2) Add a checker to the Clang Static Analyzer that performs (the same?) clone detection but limited to a single translation unit.
> How much code reuse will there be between the two? Will the stand-alone tool be built on top of the checker? I did not get that feeling from the proposal, especially, since the stand-alone tool will be completed first. It seems that all of the goals mentioned in the proposal except for the cross translation unit analysis could be done in the static analyzer. So why not start with that? I think it would be very beneficial for the project to have some clone detection committed in tree, immediately available to all of the existing users of the static analyzer! 
> One of the obstacles in contributing the existing checker to the analyzer is issue reporting. Have you considered reporting the subsequent clones as a note on the first clone? The clones are related and it looks like the current output does not highlight that. (The static analyzer does not support notes right now, so you'd need to extend that functionality.)
> I am very apprehensive about adding yet another analysis tool to the clang ecosystem. Having clang-tidy, the Clang Static Analyzer + yet another tool would be quite confusing to the user. The most user friendly approach is to have a single tool that highlights all the problems the users have in their code. I do acknowledge that it would not be possible to make the clone checker scale to cross translation unit analysis since we do not currently have the infrastructure to support that. However, building the stand-alone tool on top of the checker would allow turning it into a cross-translation unit checker once the infrastructure is added to the static analyzer.
> Have you looked into CodeChecker and the new scan-build.py projects? They do rely on using the compilation database, which is something you plan on doing as well. Can you reuse scan-build.py instead of writing your own build interposition? The goal of CodeChecker is to collect and display static analysis reports generated by all clang-based tools, specifically, both clang-tidy and the Clang Static Analyzer are already supported. It would be valuable if the new stand-alone tools would be incorporated into the same workflow. This way the users could have a single point of entry when they look for bugs.
> CodeChecker incorporates a nice bug viewing UI. Integrating clone reporting into that UI would be great. However, you might need to extend/modify both the reporting and the UI to make it look great.
> What do you think?
> Anna.
>> --Vassil
>> On 23/03/16 03:37, Raphael Isemann via cfe-dev wrote:
>>> Hi everybody,
>>> just wanted to post my GSoC proposal for feedback:
>>> https://docs.google.com/document/d/1hY_EUIqeQ6cAYaIqrePWvcy6XmkMv5BoxrfskEv5Tl0/edit?usp=sharing <https://docs.google.com/document/d/1hY_EUIqeQ6cAYaIqrePWvcy6XmkMv5BoxrfskEv5Tl0/edit?usp=sharing>
>>> Feel free to comment on the document if you think something should be improved.
>>> I also try to idle in the IRC channel from now on if someone prefers
>>> that. My nickname there is also "teemperor".
>>> Cheers,
>>> Raphael Isemann
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160405/e5af07dd/attachment.html>

More information about the cfe-dev mailing list