[cfe-dev] CopyPaste detection clang static analyzer
Vassil Vassilev
vvasilev at cern.ch
Mon Mar 9 01:56:48 PDT 2015
On 24/02/15 06:15, Anna Zaks wrote:
>
>> On Feb 18, 2015, at 2:50 AM, Vassil Vassilev <vvasilev at cern.ch
>> <mailto:vvasilev at cern.ch>> wrote:
>>
>> That's great! What would be the next steps? Do you know who will be
>> the GSoC org admin?
>
> There was an email sent about GCoC a couple of days ago to the LLVMDev
> list.
Thanks for the information. I addressed all of your comments and sent a
patch to OpenProjects.html, cc-ing also you, Anna, for a review.
Many thanks,
Vassil
>
>> Do you think we should improve the project description
>
> I think adding specific examples that we want to handle would be
> useful in scoping this down.
>
>> and nominate a backup mentor?
>> Vassil
>> On 17/02/15 20:05, Anna Zaks wrote:
>>> This would be a very useful feature to have in the clang static
>>> analyzer and can be scoped for a GSoC project!
>>>
>>> Anna.
>>>
>>>> On Feb 10, 2015, at 4:06 AM, Vassil Vassilev <vvasilev at cern.ch
>>>> <mailto:vvasilev at cern.ch>> wrote:
>>>>
>>>> Hi all,
>>>> I just wanted to bump this up (given GSoC is starting). I didn't
>>>> manage to get a good student for this project (proposal is below)
>>>> last year :(. I thought maybe if we went through the LLVM mentoring
>>>> organization would be better. Do you think this would make a good
>>>> GSoC project from Clang's perspective? I'd be happy to update the
>>>> proposal to make it more attractive or general-purpose.
>>>> Vassil
>>>>
>>>>
>>>> Code copy/paste detection
>>>>
>>>> *Description*:The copy/paste is common programming practice. Most
>>>> of the programmers start from a code snippet that already exists in
>>>> the system and modify it to match their needs. Easily some of the
>>>> code snippets end up being copied dozens of times, which leads to
>>>> worse maintainability, understandability and logical design.
>>>> Clang(link is external) <http://clang.llvm.org/> and clang's static
>>>> analyzer(link is external) <http://http//clang-analyzer.llvm.org/>
>>>> provide all the building blocks to build a generic C/C++ copy/paste
>>>> detector.
>>>> *Expected results*:Build a standalone tool or clang plugin being
>>>> able to detect copy/pasted code.
>
> I think having this integrated into one of the existing clang tools
> should the be the goal. For example, the static analyzer is a good
> fit. The static analyzer does not have plugins.
>
>>>> Lay the foundations of detection of slightly modified code
>>>> (semantic analysis required). Implement tests for all the realized
>>>> functionality. Prepare a final poster of the work and be ready to
>>>> present it.
>>>> *Required knowledge*: Advanced C++, Basic knowledge of Clang/Clang
>>>> Static Analyzer.
>>>>
>>>> *Mentor*: Vassil Vassilev/ maybe somebody else as second mentor?
>>>> <mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling>
>>>>
>>>>
>>>> On 07/02/14 22:20, Nick Lewycky wrote:
>>>>> On 7 February 2014 04:49, Vassil Vassilev <vvasilev at cern.ch
>>>>> <mailto:vvasilev at cern.ch>> wrote:
>>>>>
>>>>> On 05/02/14 21:32, Nick Lewycky wrote:
>>>>>> On 3 February 2014 14:08, Richard <legalize at xmission.com
>>>>>> <mailto:legalize at xmission.com>> wrote:
>>>>>>
>>>>>>
>>>>>> In article
>>>>>> <CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com
>>>>>> <mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw at mail.gmail.com>>,
>>>>>> David Blaikie <dblaikie at gmail.com
>>>>>> <mailto:dblaikie at gmail.com>> writes:
>>>>>>
>>>>>> > On Mon, Feb 3, 2014 at 3:06 AM, Vassil Vassilev
>>>>>> <vvasilev at cern.ch <mailto:vvasilev at cern.ch>> wrote:
>>>>>> >
>>>>>> > > A few months ago I was looking for a copy-paste
>>>>>> detector for a C++
>>>>>> > > project. I didn't find such a feature of clang's
>>>>>> static analyzer. Is this
>>>>>> > > the case?
>>>>>> >
>>>>>> > copy-paste detector? As in plagarism detection?
>>>>>>
>>>>>> I don't think plagiarism is the concern. The conern is that
>>>>>> copy/paste of blocks of code where the pasted block needs
>>>>>> to be
>>>>>> updated in several places, but not all of the updates
>>>>>> were performed.
>>>>>>
>>>>>>
>>>>>> I've implemented this sort of thing, but it's only 80%
>>>>>> finished and has been kicking around on the low-priority end
>>>>>> of my todo list for the past couple of years. Patch attached.
>>>>>> It'd be great if someone were interested in finishing this
>>>>>> off. I won't get to it soon.
>>>>>>
>>>>>> Note that it's a warning instead of a static analysis check
>>>>>> which means that it must have an aggressively low number of
>>>>>> false positives, and that it must be run quickly. The
>>>>>> implementation I have analyzes conditional operators and
>>>>>> if/elseif chains, but doesn't collect all the expressions
>>>>>> through something like a && b &&c && a. That would be the
>>>>>> next thing to add.
>>>>>>
>>>>>> It does have some really cool properties that we can only get
>>>>>> because clang integrates closely with its preprocessor.
>>>>>> Consider this sample from the testcase:
>>>>>>
>>>>>> #define num_cpus() (1)
>>>>>> #define max_omp_threads() (1)
>>>>>> int test8(int expr) {
>>>>>> if (expr) {
>>>>>> return num_cpus();
>>>>>> } else {
>>>>>> return max_omp_threads();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> We know better than to warn on that, even though the AST
>>>>>> looks the same. If you instead write "return num_cpus();"
>>>>>> twice, we warn on that (that's test9 in the testsuite).
>>>>>>
>>>>>> Nick
>>>>> Thanks this looks very interesting. This may be a good start
>>>>> for a student. IIUC a non-unique expr is the ones that have
>>>>> same source ranges and same FileIDs, right? Could this be
>>>>> upgraded to AST-node (structural) comparison?
>>>>>
>>>>>
>>>>> It is an AST-node comparison. In order to handle the case of
>>>>> different macros, we ask the AST nodes what their SourceLocation
>>>>> was, and factor in the macroid, if there was one. A large part of
>>>>> the patch is a change to the Stmt::profile logic to look at all
>>>>> the sourcelocations in all the possible AST nodes.
>>>>>
>>>>>
>>>>> Vassil
>>>>>
>>>>>>
>>>>>> Coverity can detect such instances, for instance.
>>>>>>
>>>>>> Here is an article from 2006 describing such a tool:
>>>>>> <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113>
>>>>>>
>>>>>> Wikipedia says PMD has a copy/paste detector that works
>>>>>> with C++:
>>>>>> <http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29
>>>>>> <http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29>>
>>>>>>
>>>>>> "Note that CPD works with Java, JSP, C, C++, C#, Fortran
>>>>>> and PHP code.
>>>>>> Your own language is missing ? See how to add it here"
>>>>>> <http://pmd.sourceforge.net/snapshot/cpd-usage.html>
>>>>>> --
>>>>>> "The Direct3D Graphics Pipeline" free book
>>>>>> <http://tinyurl.com/d3d-pipeline>
>>>>>> The Computer Graphics Museum
>>>>>> <http://ComputerGraphicsMuseum.org
>>>>>> <http://computergraphicsmuseum.org/>>
>>>>>> The Terminals Wiki
>>>>>> <http://terminals.classiccmp.org
>>>>>> <http://terminals.classiccmp.org/>>
>>>>>> Legalize Adulthood! (my blog)
>>>>>> <http://LegalizeAdulthood.wordpress.com
>>>>>> <http://legalizeadulthood.wordpress.com/>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --------------------------------------------
>>>> Q: Why is this email five sentences or less?
>>>> A:http://five.sentenc.es
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150309/04b76894/attachment.html>
More information about the cfe-dev
mailing list