<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hi all,<br>
        I just wanted to bump this up (given GSoC is starting). I didn't
      manage to get a good student for this project (proposal is below)
      last year :(. I thought maybe if we went through the LLVM
      mentoring organization would be better. Do you think this would
      make a good GSoC project from Clang's perspective? I'd be happy to
      update the proposal to make it more attractive or general-purpose.<br>
      Vassil<br>
      <br>
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <h3>Code copy/paste detection</h3>
      <div><strong>Description</strong>:The copy/paste is common
        programming practice. Most of the programmers start from a code
        snippet that already exists in the system and modify it to match
        their needs. Easily some of the code snippets end up being
        copied dozens of times, which leads to worse maintainability,
        understandability and logical design. <a class="ext"
          href="http://clang.llvm.org">Clang<span class="ext"><span
              class="element-invisible"> (link is external)</span></span></a>
        and <a class="ext"
          href="http://http://clang-analyzer.llvm.org/">clang's static
          analyzer<span class="ext"><span class="element-invisible">
              (link is external)</span></span></a> provide all the
        building blocks to build a generic C/C++ copy/paste detector.</div>
      <div><strong>Expected results</strong>:Build a standalone tool or
        clang plugin being able to detect copy/pasted code. Lay the
        foundations of detection of slightly modified code (semantic
        analysis required). Implement tests for all the realized
        functionality. Prepare a final poster of the work and be ready
        to present it.</div>
      <div><strong>Required knowledge</strong>: Advanced C++, Basic
        knowledge of Clang/Clang Static Analyzer.</div>
      <p><strong>Mentor</strong>: Vassil Vassilev/ maybe somebody else
        as second mentor?<a class="mailto"
href="mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling"><span
            class="mailto"><span class="element-invisible"><br>
            </span></span></a></p>
      <br>
      On 07/02/14 22:20, Nick Lewycky wrote:<br>
    </div>
    <blockquote
cite="mid:CADbEz-hdxzO6VFrRPewungnLxAPKZ7po1C07r5STaeV8z_+qpg@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">On 7 February 2014 04:49, Vassil
            Vassilev <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:vvasilev@cern.ch" target="_blank">vvasilev@cern.ch</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <div>On 05/02/14 21:32, Nick Lewycky wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">On 3 February 2014
                          14:08, Richard <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:legalize@xmission.com"
                              target="_blank">legalize@xmission.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
                            In article <<a moz-do-not-send="true"
href="mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com"
                              target="_blank">CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com</a>>,<br>
                            <div>    David Blaikie <<a
                                moz-do-not-send="true"
                                href="mailto:dblaikie@gmail.com"
                                target="_blank">dblaikie@gmail.com</a>>

                              writes:<br>
                              <br>
                              > On Mon, Feb 3, 2014 at 3:06 AM,
                              Vassil Vassilev <<a
                                moz-do-not-send="true"
                                href="mailto:vvasilev@cern.ch"
                                target="_blank">vvasilev@cern.ch</a>>
                              wrote:<br>
                              ><br>
                            </div>
                            <div>> >   A few months ago I was
                              looking for a copy-paste detector for a
                              C++<br>
                              > > project. I didn't find such a
                              feature of clang's static analyzer. Is
                              this<br>
                              > > the case?<br>
                              ><br>
                              > copy-paste detector? As in plagarism
                              detection?<br>
                              <br>
                            </div>
                            I don't think plagiarism is the concern.
                             The conern is that<br>
                            copy/paste of blocks of code where the
                            pasted block needs to be<br>
                            updated in several places, but not all of
                            the updates were performed.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>I've implemented this sort of thing, but
                            it's only 80% finished and has been kicking
                            around on the low-priority end of my todo
                            list for the past couple of years. Patch
                            attached. It'd be great if someone were
                            interested in finishing this off. I won't
                            get to it soon.</div>
                          <div><br>
                          </div>
                          <div>Note that it's a warning instead of a
                            static analysis check which means that it
                            must have an aggressively low number of
                            false positives, and that it must be run
                            quickly. The implementation I have analyzes
                            conditional operators and if/elseif chains,
                            but doesn't collect all the expressions
                            through something like a && b
                            &&c && a. That would be the
                            next thing to add.</div>
                          <div><br>
                          </div>
                          <div>It does have some really cool properties
                            that we can only get because clang
                            integrates closely with its preprocessor.
                            Consider this sample from the testcase:</div>
                          <div><br>
                            #define num_cpus() (1)<br>
                            #define max_omp_threads() (1)<br>
                            int test8(int expr) {<br>
                              if (expr) {<br>
                                return num_cpus();<br>
                              } else {<br>
                                return max_omp_threads();<br>
                              }<br>
                            }</div>
                          <div><br>
                          </div>
                          <div>We know better than to warn on that, even
                            though the AST looks the same. If you
                            instead write "return num_cpus();" twice, we
                            warn on that (that's test9 in the
                            testsuite).</div>
                          <div><br>
                          </div>
                          <div>Nick</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Thanks this looks very interesting. This may be a good
                start for a student. IIUC a non-unique expr is the ones
                that have same source ranges and same FileIDs, right?
                Could this be upgraded to AST-node (structural)
                comparison?</div>
            </blockquote>
            <div><br>
            </div>
            <div>It is an AST-node comparison. In order to handle the
              case of different macros, we ask the AST nodes what their
              SourceLocation was, and factor in the macroid, if there
              was one. A large part of the patch is a change to the
              Stmt::profile logic to look at all the sourcelocations in
              all the possible AST nodes.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"><span class="HOEnZb"><font
                    color="#888888"><br>
                    Vassil</font></span>
                <div class="im"><br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Coverity

                            can detect such instances, for instance.<br>
                            <br>
                            Here is an article from 2006 describing such
                            a tool:<br>
                            <<a moz-do-not-send="true"
                              href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113"
                              target="_blank">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113</a>><br>
                            <br>
                            Wikipedia says PMD has a copy/paste detector
                            that works with C++:<br>
                            <<a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29"
                              target="_blank">http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29</a>><br>
                            <br>
                            "Note that CPD works with Java, JSP, C, C++,
                            C#, Fortran and PHP code.<br>
                            Your own language is missing ? See how to
                            add it here"<br>
                            <<a moz-do-not-send="true"
                              href="http://pmd.sourceforge.net/snapshot/cpd-usage.html"
                              target="_blank">http://pmd.sourceforge.net/snapshot/cpd-usage.html</a>><br>
                            <span><font color="#888888">--<br>
                                "The Direct3D Graphics Pipeline" free
                                book <<a moz-do-not-send="true"
                                  href="http://tinyurl.com/d3d-pipeline"
                                  target="_blank">http://tinyurl.com/d3d-pipeline</a>><br>
                                     The Computer Graphics Museum <<a
                                  moz-do-not-send="true"
                                  href="http://ComputerGraphicsMuseum.org"
                                  target="_blank">http://ComputerGraphicsMuseum.org</a>><br>
                                         The Terminals Wiki <<a
                                  moz-do-not-send="true"
                                  href="http://terminals.classiccmp.org"
                                  target="_blank">http://terminals.classiccmp.org</a>><br>
                                  Legalize Adulthood! (my blog) <<a
                                  moz-do-not-send="true"
                                  href="http://LegalizeAdulthood.wordpress.com"
                                  target="_blank">http://LegalizeAdulthood.wordpress.com</a>><br>
                              </font></span>
                            <div>
                              <div>_______________________________________________<br>
                                cfe-dev mailing list<br>
                                <a moz-do-not-send="true"
                                  href="mailto:cfe-dev@cs.uiuc.edu"
                                  target="_blank">cfe-dev@cs.uiuc.edu</a><br>
                                <a moz-do-not-send="true"
                                  href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"
                                  target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
                  </blockquote>
                  <br>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
--------------------------------------------
Q: Why is this email five sentences or less?
A: <a class="moz-txt-link-freetext" href="http://five.sentenc.es">http://five.sentenc.es</a>
</pre>
  </body>
</html>