<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi all,<br>
I just wanted to bump this up (given GSoC is starting). I didn't
manage to get a good student for this project (proposal is below)
last year :(. I thought maybe if we went through the LLVM
mentoring organization would be better. Do you think this would
make a good GSoC project from Clang's perspective? I'd be happy to
update the proposal to make it more attractive or general-purpose.<br>
Vassil<br>
<br>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<h3>Code copy/paste detection</h3>
<div><strong>Description</strong>:The copy/paste is common
programming practice. Most of the programmers start from a code
snippet that already exists in the system and modify it to match
their needs. Easily some of the code snippets end up being
copied dozens of times, which leads to worse maintainability,
understandability and logical design. <a class="ext"
href="http://clang.llvm.org">Clang<span class="ext"><span
class="element-invisible"> (link is external)</span></span></a>
and <a class="ext"
href="http://http://clang-analyzer.llvm.org/">clang's static
analyzer<span class="ext"><span class="element-invisible">
(link is external)</span></span></a> provide all the
building blocks to build a generic C/C++ copy/paste detector.</div>
<div><strong>Expected results</strong>:Build a standalone tool or
clang plugin being able to detect copy/pasted code. Lay the
foundations of detection of slightly modified code (semantic
analysis required). Implement tests for all the realized
functionality. Prepare a final poster of the work and be ready
to present it.</div>
<div><strong>Required knowledge</strong>: Advanced C++, Basic
knowledge of Clang/Clang Static Analyzer.</div>
<p><strong>Mentor</strong>: Vassil Vassilev/ maybe somebody else
as second mentor?<a class="mailto"
href="mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling"><span
class="mailto"><span class="element-invisible"><br>
</span></span></a></p>
<br>
On 07/02/14 22:20, Nick Lewycky wrote:<br>
</div>
<blockquote
cite="mid:CADbEz-hdxzO6VFrRPewungnLxAPKZ7po1C07r5STaeV8z_+qpg@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On 7 February 2014 04:49, Vassil
Vassilev <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:vvasilev@cern.ch" target="_blank">vvasilev@cern.ch</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<div>On 05/02/14 21:32, Nick Lewycky wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On 3 February 2014
14:08, Richard <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:legalize@xmission.com"
target="_blank">legalize@xmission.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
In article <<a moz-do-not-send="true"
href="mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com"
target="_blank">CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com</a>>,<br>
<div> David Blaikie <<a
moz-do-not-send="true"
href="mailto:dblaikie@gmail.com"
target="_blank">dblaikie@gmail.com</a>>
writes:<br>
<br>
> On Mon, Feb 3, 2014 at 3:06 AM,
Vassil Vassilev <<a
moz-do-not-send="true"
href="mailto:vvasilev@cern.ch"
target="_blank">vvasilev@cern.ch</a>>
wrote:<br>
><br>
</div>
<div>> > A few months ago I was
looking for a copy-paste detector for a
C++<br>
> > project. I didn't find such a
feature of clang's static analyzer. Is
this<br>
> > the case?<br>
><br>
> copy-paste detector? As in plagarism
detection?<br>
<br>
</div>
I don't think plagiarism is the concern.
The conern is that<br>
copy/paste of blocks of code where the
pasted block needs to be<br>
updated in several places, but not all of
the updates were performed.<br>
</blockquote>
<div><br>
</div>
<div>I've implemented this sort of thing, but
it's only 80% finished and has been kicking
around on the low-priority end of my todo
list for the past couple of years. Patch
attached. It'd be great if someone were
interested in finishing this off. I won't
get to it soon.</div>
<div><br>
</div>
<div>Note that it's a warning instead of a
static analysis check which means that it
must have an aggressively low number of
false positives, and that it must be run
quickly. The implementation I have analyzes
conditional operators and if/elseif chains,
but doesn't collect all the expressions
through something like a && b
&&c && a. That would be the
next thing to add.</div>
<div><br>
</div>
<div>It does have some really cool properties
that we can only get because clang
integrates closely with its preprocessor.
Consider this sample from the testcase:</div>
<div><br>
#define num_cpus() (1)<br>
#define max_omp_threads() (1)<br>
int test8(int expr) {<br>
if (expr) {<br>
return num_cpus();<br>
} else {<br>
return max_omp_threads();<br>
}<br>
}</div>
<div><br>
</div>
<div>We know better than to warn on that, even
though the AST looks the same. If you
instead write "return num_cpus();" twice, we
warn on that (that's test9 in the
testsuite).</div>
<div><br>
</div>
<div>Nick</div>
</div>
</div>
</div>
</blockquote>
</div>
Thanks this looks very interesting. This may be a good
start for a student. IIUC a non-unique expr is the ones
that have same source ranges and same FileIDs, right?
Could this be upgraded to AST-node (structural)
comparison?</div>
</blockquote>
<div><br>
</div>
<div>It is an AST-node comparison. In order to handle the
case of different macros, we ask the AST nodes what their
SourceLocation was, and factor in the macroid, if there
was one. A large part of the patch is a change to the
Stmt::profile logic to look at all the sourcelocations in
all the possible AST nodes.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class="HOEnZb"><font
color="#888888"><br>
Vassil</font></span>
<div class="im"><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Coverity
can detect such instances, for instance.<br>
<br>
Here is an article from 2006 describing such
a tool:<br>
<<a moz-do-not-send="true"
href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113"
target="_blank">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113</a>><br>
<br>
Wikipedia says PMD has a copy/paste detector
that works with C++:<br>
<<a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29"
target="_blank">http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29</a>><br>
<br>
"Note that CPD works with Java, JSP, C, C++,
C#, Fortran and PHP code.<br>
Your own language is missing ? See how to
add it here"<br>
<<a moz-do-not-send="true"
href="http://pmd.sourceforge.net/snapshot/cpd-usage.html"
target="_blank">http://pmd.sourceforge.net/snapshot/cpd-usage.html</a>><br>
<span><font color="#888888">--<br>
"The Direct3D Graphics Pipeline" free
book <<a moz-do-not-send="true"
href="http://tinyurl.com/d3d-pipeline"
target="_blank">http://tinyurl.com/d3d-pipeline</a>><br>
The Computer Graphics Museum <<a
moz-do-not-send="true"
href="http://ComputerGraphicsMuseum.org"
target="_blank">http://ComputerGraphicsMuseum.org</a>><br>
The Terminals Wiki <<a
moz-do-not-send="true"
href="http://terminals.classiccmp.org"
target="_blank">http://terminals.classiccmp.org</a>><br>
Legalize Adulthood! (my blog) <<a
moz-do-not-send="true"
href="http://LegalizeAdulthood.wordpress.com"
target="_blank">http://LegalizeAdulthood.wordpress.com</a>><br>
</font></span>
<div>
<div>_______________________________________________<br>
cfe-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:cfe-dev@cs.uiuc.edu"
target="_blank">cfe-dev@cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
--------------------------------------------
Q: Why is this email five sentences or less?
A: <a class="moz-txt-link-freetext" href="http://five.sentenc.es">http://five.sentenc.es</a>
</pre>
</body>
</html>