<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">This would be a very useful feature to have in the clang static analyzer and can be scoped for a GSoC project!</div><div class=""><br class=""></div><div class="">Anna.</div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Feb 10, 2015, at 4:06 AM, Vassil Vassilev <<a href="mailto:vvasilev@cern.ch" class="">vvasilev@cern.ch</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
<div class="moz-cite-prefix">Hi all,<br class="">
I just wanted to bump this up (given GSoC is starting). I didn't
manage to get a good student for this project (proposal is below)
last year :(. I thought maybe if we went through the LLVM
mentoring organization would be better. Do you think this would
make a good GSoC project from Clang's perspective? I'd be happy to
update the proposal to make it more attractive or general-purpose.<br class="">
Vassil<br class="">
<br class="">
<meta http-equiv="content-type" content="text/html; charset=utf-8" class="">
<h3 class="">Code copy/paste detection</h3>
<div class=""><strong class="">Description</strong>:The copy/paste is common
programming practice. Most of the programmers start from a code
snippet that already exists in the system and modify it to match
their needs. Easily some of the code snippets end up being
copied dozens of times, which leads to worse maintainability,
understandability and logical design. <a class="ext" href="http://clang.llvm.org/">Clang<span class="ext"><span class="element-invisible"> (link is external)</span></span></a>
and <a class="ext" href="http://http//clang-analyzer.llvm.org/">clang's static
analyzer<span class="ext"><span class="element-invisible">
(link is external)</span></span></a> provide all the
building blocks to build a generic C/C++ copy/paste detector.</div>
<div class=""><strong class="">Expected results</strong>:Build a standalone tool or
clang plugin being able to detect copy/pasted code. Lay the
foundations of detection of slightly modified code (semantic
analysis required). Implement tests for all the realized
functionality. Prepare a final poster of the work and be ready
to present it.</div>
<div class=""><strong class="">Required knowledge</strong>: Advanced C++, Basic
knowledge of Clang/Clang Static Analyzer.</div><p class=""><strong class="">Mentor</strong>: Vassil Vassilev/ maybe somebody else
as second mentor?<a class="mailto" href="mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling"><span class="mailto"><br class="">
</span></a></p>
<br class="">
On 07/02/14 22:20, Nick Lewycky wrote:<br class="">
</div>
<blockquote cite="mid:CADbEz-hdxzO6VFrRPewungnLxAPKZ7po1C07r5STaeV8z_+qpg@mail.gmail.com" type="cite" class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<div dir="ltr" class="">
<div class="gmail_extra">
<div class="gmail_quote">On 7 February 2014 04:49, Vassil
Vassilev <span dir="ltr" class=""><<a moz-do-not-send="true" href="mailto:vvasilev@cern.ch" target="_blank" class="">vvasilev@cern.ch</a>></span>
wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000" class="">
<div class="im">
<div class="">On 05/02/14 21:32, Nick Lewycky wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="gmail_extra">
<div class="gmail_quote">On 3 February 2014
14:08, Richard <span dir="ltr" class=""><<a moz-do-not-send="true" href="mailto:legalize@xmission.com" target="_blank" class="">legalize@xmission.com</a>></span>
wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br class="">
In article <<a moz-do-not-send="true" href="mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com" target="_blank" class="">CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com</a>>,<br class="">
<div class=""> David Blaikie <<a moz-do-not-send="true" href="mailto:dblaikie@gmail.com" target="_blank" class="">dblaikie@gmail.com</a>>
writes:<br class="">
<br class="">
> On Mon, Feb 3, 2014 at 3:06 AM,
Vassil Vassilev <<a moz-do-not-send="true" href="mailto:vvasilev@cern.ch" target="_blank" class="">vvasilev@cern.ch</a>>
wrote:<br class="">
><br class="">
</div>
<div class="">> > A few months ago I was
looking for a copy-paste detector for a
C++<br class="">
> > project. I didn't find such a
feature of clang's static analyzer. Is
this<br class="">
> > the case?<br class="">
><br class="">
> copy-paste detector? As in plagarism
detection?<br class="">
<br class="">
</div>
I don't think plagiarism is the concern.
The conern is that<br class="">
copy/paste of blocks of code where the
pasted block needs to be<br class="">
updated in several places, but not all of
the updates were performed.<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">I've implemented this sort of thing, but
it's only 80% finished and has been kicking
around on the low-priority end of my todo
list for the past couple of years. Patch
attached. It'd be great if someone were
interested in finishing this off. I won't
get to it soon.</div>
<div class=""><br class="">
</div>
<div class="">Note that it's a warning instead of a
static analysis check which means that it
must have an aggressively low number of
false positives, and that it must be run
quickly. The implementation I have analyzes
conditional operators and if/elseif chains,
but doesn't collect all the expressions
through something like a && b
&&c && a. That would be the
next thing to add.</div>
<div class=""><br class="">
</div>
<div class="">It does have some really cool properties
that we can only get because clang
integrates closely with its preprocessor.
Consider this sample from the testcase:</div>
<div class=""><br class="">
#define num_cpus() (1)<br class="">
#define max_omp_threads() (1)<br class="">
int test8(int expr) {<br class="">
if (expr) {<br class="">
return num_cpus();<br class="">
} else {<br class="">
return max_omp_threads();<br class="">
}<br class="">
}</div>
<div class=""><br class="">
</div>
<div class="">We know better than to warn on that, even
though the AST looks the same. If you
instead write "return num_cpus();" twice, we
warn on that (that's test9 in the
testsuite).</div>
<div class=""><br class="">
</div>
<div class="">Nick</div>
</div>
</div>
</div>
</blockquote>
</div>
Thanks this looks very interesting. This may be a good
start for a student. IIUC a non-unique expr is the ones
that have same source ranges and same FileIDs, right?
Could this be upgraded to AST-node (structural)
comparison?</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">It is an AST-node comparison. In order to handle the
case of different macros, we ask the AST nodes what their
SourceLocation was, and factor in the macroid, if there
was one. A large part of the patch is a change to the
Stmt::profile logic to look at all the sourcelocations in
all the possible AST nodes.</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000" class=""><span class="HOEnZb"><font color="#888888" class=""><br class="">
Vassil</font></span>
<div class="im"><br class="">
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="gmail_extra">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Coverity
can detect such instances, for instance.<br class="">
<br class="">
Here is an article from 2006 describing such
a tool:<br class="">
<<a moz-do-not-send="true" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113" target="_blank" class="">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113</a>><br class="">
<br class="">
Wikipedia says PMD has a copy/paste detector
that works with C++:<br class="">
<<a moz-do-not-send="true" href="http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29" target="_blank" class="">http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29</a>><br class="">
<br class="">
"Note that CPD works with Java, JSP, C, C++,
C#, Fortran and PHP code.<br class="">
Your own language is missing ? See how to
add it here"<br class="">
<<a moz-do-not-send="true" href="http://pmd.sourceforge.net/snapshot/cpd-usage.html" target="_blank" class="">http://pmd.sourceforge.net/snapshot/cpd-usage.html</a>><br class="">
<span class=""><font color="#888888" class="">--<br class="">
"The Direct3D Graphics Pipeline" free
book <<a moz-do-not-send="true" href="http://tinyurl.com/d3d-pipeline" target="_blank" class="">http://tinyurl.com/d3d-pipeline</a>><br class="">
The Computer Graphics Museum <<a moz-do-not-send="true" href="http://computergraphicsmuseum.org/" target="_blank" class="">http://ComputerGraphicsMuseum.org</a>><br class="">
The Terminals Wiki <<a moz-do-not-send="true" href="http://terminals.classiccmp.org/" target="_blank" class="">http://terminals.classiccmp.org</a>><br class="">
Legalize Adulthood! (my blog) <<a moz-do-not-send="true" href="http://legalizeadulthood.wordpress.com/" target="_blank" class="">http://LegalizeAdulthood.wordpress.com</a>><br class="">
</font></span>
<div class="">
<div class="">_______________________________________________<br class="">
cfe-dev mailing list<br class="">
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank" class="">cfe-dev@cs.uiuc.edu</a><br class="">
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
<br class="">
<fieldset class=""></fieldset>
<br class="">
<pre class="">_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank" class="">cfe-dev@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
<br class="">
<br class="">
<pre class="moz-signature" cols="72">--
--------------------------------------------
Q: Why is this email five sentences or less?
A: <a class="moz-txt-link-freetext" href="http://five.sentenc.es/">http://five.sentenc.es</a>
</pre>
</div>
_______________________________________________<br class="">cfe-dev mailing list<br class=""><a href="mailto:cfe-dev@cs.uiuc.edu" class="">cfe-dev@cs.uiuc.edu</a><br class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev<br class=""></div></blockquote></div><br class=""></div></body></html>