[cfe-dev] CopyPaste detection clang static analyzer

Vassil Vassilev vvasilev at cern.ch
Wed Feb 18 02:50:15 PST 2015


That's great! What would be the next steps? Do you know who will be the 
GSoC org admin? Do you think we should improve the project description 
and nominate a backup mentor?
Vassil
On 17/02/15 20:05, Anna Zaks wrote:
> This would be a very useful feature to have in the clang static 
> analyzer and can be scoped for a GSoC project!
>
> Anna.
>
>> On Feb 10, 2015, at 4:06 AM, Vassil Vassilev <vvasilev at cern.ch 
>> <mailto:vvasilev at cern.ch>> wrote:
>>
>> Hi all,
>>   I just wanted to bump this up (given GSoC is starting). I didn't 
>> manage to get a good student for this project (proposal is below) 
>> last year :(. I thought maybe if we went through the LLVM mentoring 
>> organization would be better. Do you think this would make a good 
>> GSoC project from Clang's perspective? I'd be happy to update the 
>> proposal to make it more attractive or general-purpose.
>> Vassil
>>
>>
>>       Code copy/paste detection
>>
>> *Description*:The copy/paste is common programming practice. Most of 
>> the programmers start from a code snippet that already exists in the 
>> system and modify it to match their needs. Easily some of the code 
>> snippets end up being copied dozens of times, which leads to worse 
>> maintainability, understandability and logical design. Clang(link is 
>> external) <http://clang.llvm.org/> and clang's static analyzer(link 
>> is external) <http://http//clang-analyzer.llvm.org/> provide all the 
>> building blocks to build a generic C/C++ copy/paste detector.
>> *Expected results*:Build a standalone tool or clang plugin being able 
>> to detect copy/pasted code. Lay the foundations of detection of 
>> slightly modified code (semantic analysis required). Implement tests 
>> for all the realized functionality. Prepare a final poster of the 
>> work and be ready to present it.
>> *Required knowledge*: Advanced C++, Basic knowledge of Clang/Clang 
>> Static Analyzer.
>>
>> *Mentor*: Vassil Vassilev/ maybe somebody else as second mentor?
>> <mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling>
>>
>>
>> On 07/02/14 22:20, Nick Lewycky wrote:
>>> On 7 February 2014 04:49, Vassil Vassilev <vvasilev at cern.ch 
>>> <mailto:vvasilev at cern.ch>> wrote:
>>>
>>>     On 05/02/14 21:32, Nick Lewycky wrote:
>>>>     On 3 February 2014 14:08, Richard <legalize at xmission.com
>>>>     <mailto:legalize at xmission.com>> wrote:
>>>>
>>>>
>>>>         In article
>>>>         <CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com
>>>>         <mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com>>,
>>>>             David Blaikie <dblaikie at gmail.com
>>>>         <mailto:dblaikie at gmail.com>> writes:
>>>>
>>>>         > On Mon, Feb 3, 2014 at 3:06 AM, Vassil Vassilev
>>>>         <vvasilev at cern.ch <mailto:vvasilev at cern.ch>> wrote:
>>>>         >
>>>>         > >   A few months ago I was looking for a copy-paste
>>>>         detector for a C++
>>>>         > > project. I didn't find such a feature of clang's static
>>>>         analyzer. Is this
>>>>         > > the case?
>>>>         >
>>>>         > copy-paste detector? As in plagarism detection?
>>>>
>>>>         I don't think plagiarism is the concern.  The conern is that
>>>>         copy/paste of blocks of code where the pasted block needs to be
>>>>         updated in several places, but not all of the updates were
>>>>         performed.
>>>>
>>>>
>>>>     I've implemented this sort of thing, but it's only 80% finished
>>>>     and has been kicking around on the low-priority end of my todo
>>>>     list for the past couple of years. Patch attached. It'd be
>>>>     great if someone were interested in finishing this off. I won't
>>>>     get to it soon.
>>>>
>>>>     Note that it's a warning instead of a static analysis check
>>>>     which means that it must have an aggressively low number of
>>>>     false positives, and that it must be run quickly. The
>>>>     implementation I have analyzes conditional operators and
>>>>     if/elseif chains, but doesn't collect all the expressions
>>>>     through something like a && b &&c && a. That would be the next
>>>>     thing to add.
>>>>
>>>>     It does have some really cool properties that we can only get
>>>>     because clang integrates closely with its preprocessor.
>>>>     Consider this sample from the testcase:
>>>>
>>>>     #define num_cpus() (1)
>>>>     #define max_omp_threads() (1)
>>>>     int test8(int expr) {
>>>>       if (expr) {
>>>>         return num_cpus();
>>>>       } else {
>>>>         return max_omp_threads();
>>>>       }
>>>>     }
>>>>
>>>>     We know better than to warn on that, even though the AST looks
>>>>     the same. If you instead write "return num_cpus();" twice, we
>>>>     warn on that (that's test9 in the testsuite).
>>>>
>>>>     Nick
>>>     Thanks this looks very interesting. This may be a good start for
>>>     a student. IIUC a non-unique expr is the ones that have same
>>>     source ranges and same FileIDs, right? Could this be upgraded to
>>>     AST-node (structural) comparison?
>>>
>>>
>>> It is an AST-node comparison. In order to handle the case of 
>>> different macros, we ask the AST nodes what their SourceLocation 
>>> was, and factor in the macroid, if there was one. A large part of 
>>> the patch is a change to the Stmt::profile logic to look at all the 
>>> sourcelocations in all the possible AST nodes.
>>>
>>>
>>>     Vassil
>>>
>>>>
>>>>         Coverity can detect such instances, for instance.
>>>>
>>>>         Here is an article from 2006 describing such a tool:
>>>>         <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113>
>>>>
>>>>         Wikipedia says PMD has a copy/paste detector that works
>>>>         with C++:
>>>>         <http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29
>>>>         <http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29>>
>>>>
>>>>         "Note that CPD works with Java, JSP, C, C++, C#, Fortran
>>>>         and PHP code.
>>>>         Your own language is missing ? See how to add it here"
>>>>         <http://pmd.sourceforge.net/snapshot/cpd-usage.html>
>>>>         --
>>>>         "The Direct3D Graphics Pipeline" free book
>>>>         <http://tinyurl.com/d3d-pipeline>
>>>>              The Computer Graphics Museum
>>>>         <http://ComputerGraphicsMuseum.org
>>>>         <http://computergraphicsmuseum.org/>>
>>>>                  The Terminals Wiki
>>>>         <http://terminals.classiccmp.org
>>>>         <http://terminals.classiccmp.org/>>
>>>>           Legalize Adulthood! (my blog)
>>>>         <http://LegalizeAdulthood.wordpress.com
>>>>         <http://legalizeadulthood.wordpress.com/>>
>>>>         _______________________________________________
>>>>         cfe-dev mailing list
>>>>         cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>>         http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     cfe-dev mailing list
>>>>     cfe-dev at cs.uiuc.edu  <mailto:cfe-dev at cs.uiuc.edu>
>>>>     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>>
>>
>>
>> -- 
>> --------------------------------------------
>> Q: Why is this email five sentences or less?
>> A:http://five.sentenc.es
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150218/14c1847a/attachment.html>


More information about the cfe-dev mailing list