[cfe-dev] CopyPaste detection clang static analyzer

Vassil Vassilev vvasilev at cern.ch
Mon Mar 9 01:56:48 PDT 2015


On 24/02/15 06:15, Anna Zaks wrote:
>
>> On Feb 18, 2015, at 2:50 AM, Vassil Vassilev <vvasilev at cern.ch 
>> <mailto:vvasilev at cern.ch>> wrote:
>>
>> That's great! What would be the next steps? Do you know who will be 
>> the GSoC org admin?
>
> There was an email sent about GCoC a couple of days ago to the LLVMDev 
> list.
Thanks for the information. I addressed all of your comments and sent a 
patch to OpenProjects.html, cc-ing also you, Anna, for a review.
Many thanks,
Vassil
>
>> Do you think we should improve the project description
>
> I think adding specific examples that we want to handle would be 
> useful in scoping this down.
>
>> and nominate a backup mentor?
>> Vassil
>> On 17/02/15 20:05, Anna Zaks wrote:
>>> This would be a very useful feature to have in the clang static 
>>> analyzer and can be scoped for a GSoC project!
>>>
>>> Anna.
>>>
>>>> On Feb 10, 2015, at 4:06 AM, Vassil Vassilev <vvasilev at cern.ch 
>>>> <mailto:vvasilev at cern.ch>> wrote:
>>>>
>>>> Hi all,
>>>>   I just wanted to bump this up (given GSoC is starting). I didn't 
>>>> manage to get a good student for this project (proposal is below) 
>>>> last year :(. I thought maybe if we went through the LLVM mentoring 
>>>> organization would be better. Do you think this would make a good 
>>>> GSoC project from Clang's perspective? I'd be happy to update the 
>>>> proposal to make it more attractive or general-purpose.
>>>> Vassil
>>>>
>>>>
>>>>       Code copy/paste detection
>>>>
>>>> *Description*:The copy/paste is common programming practice. Most 
>>>> of the programmers start from a code snippet that already exists in 
>>>> the system and modify it to match their needs. Easily some of the 
>>>> code snippets end up being copied dozens of times, which leads to 
>>>> worse maintainability, understandability and logical design. 
>>>> Clang(link is external) <http://clang.llvm.org/> and clang's static 
>>>> analyzer(link is external) <http://http//clang-analyzer.llvm.org/> 
>>>> provide all the building blocks to build a generic C/C++ copy/paste 
>>>> detector.
>>>> *Expected results*:Build a standalone tool or clang plugin being 
>>>> able to detect copy/pasted code.
>
> I think having this integrated into one of the existing clang tools 
> should the be the goal. For example, the static analyzer is a good 
> fit. The static analyzer does not have plugins.
>
>>>> Lay the foundations of detection of slightly modified code 
>>>> (semantic analysis required). Implement tests for all the realized 
>>>> functionality. Prepare a final poster of the work and be ready to 
>>>> present it.
>>>> *Required knowledge*: Advanced C++, Basic knowledge of Clang/Clang 
>>>> Static Analyzer.
>>>>
>>>> *Mentor*: Vassil Vassilev/ maybe somebody else as second mentor?
>>>> <mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling>
>>>>
>>>>
>>>> On 07/02/14 22:20, Nick Lewycky wrote:
>>>>> On 7 February 2014 04:49, Vassil Vassilev <vvasilev at cern.ch 
>>>>> <mailto:vvasilev at cern.ch>> wrote:
>>>>>
>>>>>     On 05/02/14 21:32, Nick Lewycky wrote:
>>>>>>     On 3 February 2014 14:08, Richard <legalize at xmission.com
>>>>>>     <mailto:legalize at xmission.com>> wrote:
>>>>>>
>>>>>>
>>>>>>         In article
>>>>>>         <CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com
>>>>>>         <mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com>>,
>>>>>>         David Blaikie <dblaikie at gmail.com
>>>>>>         <mailto:dblaikie at gmail.com>> writes:
>>>>>>
>>>>>>         > On Mon, Feb 3, 2014 at 3:06 AM, Vassil Vassilev
>>>>>>         <vvasilev at cern.ch <mailto:vvasilev at cern.ch>> wrote:
>>>>>>         >
>>>>>>         > >   A few months ago I was looking for a copy-paste
>>>>>>         detector for a C++
>>>>>>         > > project. I didn't find such a feature of clang's
>>>>>>         static analyzer. Is this
>>>>>>         > > the case?
>>>>>>         >
>>>>>>         > copy-paste detector? As in plagarism detection?
>>>>>>
>>>>>>         I don't think plagiarism is the concern.  The conern is that
>>>>>>         copy/paste of blocks of code where the pasted block needs
>>>>>>         to be
>>>>>>         updated in several places, but not all of the updates
>>>>>>         were performed.
>>>>>>
>>>>>>
>>>>>>     I've implemented this sort of thing, but it's only 80%
>>>>>>     finished and has been kicking around on the low-priority end
>>>>>>     of my todo list for the past couple of years. Patch attached.
>>>>>>     It'd be great if someone were interested in finishing this
>>>>>>     off. I won't get to it soon.
>>>>>>
>>>>>>     Note that it's a warning instead of a static analysis check
>>>>>>     which means that it must have an aggressively low number of
>>>>>>     false positives, and that it must be run quickly. The
>>>>>>     implementation I have analyzes conditional operators and
>>>>>>     if/elseif chains, but doesn't collect all the expressions
>>>>>>     through something like a && b &&c && a. That would be the
>>>>>>     next thing to add.
>>>>>>
>>>>>>     It does have some really cool properties that we can only get
>>>>>>     because clang integrates closely with its preprocessor.
>>>>>>     Consider this sample from the testcase:
>>>>>>
>>>>>>     #define num_cpus() (1)
>>>>>>     #define max_omp_threads() (1)
>>>>>>     int test8(int expr) {
>>>>>>       if (expr) {
>>>>>>         return num_cpus();
>>>>>>       } else {
>>>>>>         return max_omp_threads();
>>>>>>       }
>>>>>>     }
>>>>>>
>>>>>>     We know better than to warn on that, even though the AST
>>>>>>     looks the same. If you instead write "return num_cpus();"
>>>>>>     twice, we warn on that (that's test9 in the testsuite).
>>>>>>
>>>>>>     Nick
>>>>>     Thanks this looks very interesting. This may be a good start
>>>>>     for a student. IIUC a non-unique expr is the ones that have
>>>>>     same source ranges and same FileIDs, right? Could this be
>>>>>     upgraded to AST-node (structural) comparison?
>>>>>
>>>>>
>>>>> It is an AST-node comparison. In order to handle the case of 
>>>>> different macros, we ask the AST nodes what their SourceLocation 
>>>>> was, and factor in the macroid, if there was one. A large part of 
>>>>> the patch is a change to the Stmt::profile logic to look at all 
>>>>> the sourcelocations in all the possible AST nodes.
>>>>>
>>>>>
>>>>>     Vassil
>>>>>
>>>>>>
>>>>>>         Coverity can detect such instances, for instance.
>>>>>>
>>>>>>         Here is an article from 2006 describing such a tool:
>>>>>>         <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113>
>>>>>>
>>>>>>         Wikipedia says PMD has a copy/paste detector that works
>>>>>>         with C++:
>>>>>>         <http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29
>>>>>>         <http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29>>
>>>>>>
>>>>>>         "Note that CPD works with Java, JSP, C, C++, C#, Fortran
>>>>>>         and PHP code.
>>>>>>         Your own language is missing ? See how to add it here"
>>>>>>         <http://pmd.sourceforge.net/snapshot/cpd-usage.html>
>>>>>>         --
>>>>>>         "The Direct3D Graphics Pipeline" free book
>>>>>>         <http://tinyurl.com/d3d-pipeline>
>>>>>>              The Computer Graphics Museum
>>>>>>         <http://ComputerGraphicsMuseum.org
>>>>>>         <http://computergraphicsmuseum.org/>>
>>>>>>                  The Terminals Wiki
>>>>>>         <http://terminals.classiccmp.org
>>>>>>         <http://terminals.classiccmp.org/>>
>>>>>>           Legalize Adulthood! (my blog)
>>>>>>         <http://LegalizeAdulthood.wordpress.com
>>>>>>         <http://legalizeadulthood.wordpress.com/>>
>>>>>>         _______________________________________________
>>>>>>         cfe-dev mailing list
>>>>>>         cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>>>>         http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     cfe-dev mailing list
>>>>>>     cfe-dev at cs.uiuc.edu  <mailto:cfe-dev at cs.uiuc.edu>
>>>>>>     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> --------------------------------------------
>>>> Q: Why is this email five sentences or less?
>>>> A:http://five.sentenc.es
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150309/04b76894/attachment.html>


More information about the cfe-dev mailing list