<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">That's great! What would be the next
      steps? Do you know who will be the GSoC org admin? Do you think we
      should improve the project description and nominate a backup
      mentor?<br>
      Vassil<br>
      On 17/02/15 20:05, Anna Zaks wrote:<br>
    </div>
    <blockquote
      cite="mid:DEA2B2DD-85C9-4BE2-A37C-775EC94FCD7C@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div class="">This would be a very useful feature to have in the
        clang static analyzer and can be scoped for a GSoC project!</div>
      <div class=""><br class="">
      </div>
      <div class="">Anna.</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div>
          <blockquote type="cite" class="">
            <div class="">On Feb 10, 2015, at 4:06 AM, Vassil Vassilev
              <<a moz-do-not-send="true"
                href="mailto:vvasilev@cern.ch" class="">vvasilev@cern.ch</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div text="#000000" bgcolor="#FFFFFF" class="">
                <div class="moz-cite-prefix">Hi all,<br class="">
                    I just wanted to bump this up (given GSoC is
                  starting). I didn't manage to get a good student for
                  this project (proposal is below) last year :(. I
                  thought maybe if we went through the LLVM mentoring
                  organization would be better. Do you think this would
                  make a good GSoC project from Clang's perspective? I'd
                  be happy to update the proposal to make it more
                  attractive or general-purpose.<br class="">
                  Vassil<br class="">
                  <br class="">
                  <h3 class="">Code copy/paste detection</h3>
                  <div class=""><strong class="">Description</strong>:The
                    copy/paste is common programming practice. Most of
                    the programmers start from a code snippet that
                    already exists in the system and modify it to match
                    their needs. Easily some of the code snippets end up
                    being copied dozens of times, which leads to worse
                    maintainability, understandability and logical
                    design. <a moz-do-not-send="true" class="ext"
                      href="http://clang.llvm.org/">Clang<span
                        class="ext"><span class="element-invisible">
                          (link is external)</span></span></a> and <a
                      moz-do-not-send="true" class="ext"
                      href="http://http//clang-analyzer.llvm.org/">clang's
                      static analyzer<span class="ext"><span
                          class="element-invisible"> (link is external)</span></span></a>
                    provide all the building blocks to build a generic
                    C/C++ copy/paste detector.</div>
                  <div class=""><strong class="">Expected results</strong>:Build
                    a standalone tool or clang plugin being able to
                    detect copy/pasted code. Lay the foundations of
                    detection of slightly modified code (semantic
                    analysis required). Implement tests for all the
                    realized functionality. Prepare a final poster of
                    the work and be ready to present it.</div>
                  <div class=""><strong class="">Required knowledge</strong>:
                    Advanced C++, Basic knowledge of Clang/Clang Static
                    Analyzer.</div>
                  <p class=""><strong class="">Mentor</strong>: Vassil
                    Vassilev/ maybe somebody else as second mentor?<a
                      moz-do-not-send="true" class="mailto"
href="mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling"><span
                        class="mailto"><br class="">
                      </span></a></p>
                  <br class="">
                  On 07/02/14 22:20, Nick Lewycky wrote:<br class="">
                </div>
                <blockquote
cite="mid:CADbEz-hdxzO6VFrRPewungnLxAPKZ7po1C07r5STaeV8z_+qpg@mail.gmail.com"
                  type="cite" class="">
                  <div dir="ltr" class="">
                    <div class="gmail_extra">
                      <div class="gmail_quote">On 7 February 2014 04:49,
                        Vassil Vassilev <span dir="ltr" class=""><<a
                            moz-do-not-send="true"
                            href="mailto:vvasilev@cern.ch"
                            target="_blank" class="">vvasilev@cern.ch</a>></span>
                        wrote:<br class="">
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000" class="">
                            <div class="im">
                              <div class="">On 05/02/14 21:32, Nick
                                Lewycky wrote:<br class="">
                              </div>
                              <blockquote type="cite" class="">
                                <div dir="ltr" class="">
                                  <div class="gmail_extra">
                                    <div class="gmail_quote">On 3
                                      February 2014 14:08, Richard <span
                                        dir="ltr" class=""><<a
                                          moz-do-not-send="true"
                                          href="mailto:legalize@xmission.com"
                                          target="_blank" class="">legalize@xmission.com</a>></span>
                                      wrote:<br class="">
                                      <blockquote class="gmail_quote"
                                        style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br
                                          class="">
                                        In article <<a
                                          moz-do-not-send="true"
href="mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com"
                                          target="_blank" class="">CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com</a>>,<br
                                          class="">
                                        <div class="">    David Blaikie
                                          <<a moz-do-not-send="true"
href="mailto:dblaikie@gmail.com" target="_blank" class="">dblaikie@gmail.com</a>>


                                          writes:<br class="">
                                          <br class="">
                                          > On Mon, Feb 3, 2014 at
                                          3:06 AM, Vassil Vassilev <<a
                                            moz-do-not-send="true"
                                            href="mailto:vvasilev@cern.ch"
                                            target="_blank" class="">vvasilev@cern.ch</a>>

                                          wrote:<br class="">
                                          ><br class="">
                                        </div>
                                        <div class="">> >   A few
                                          months ago I was looking for a
                                          copy-paste detector for a C++<br
                                            class="">
                                          > > project. I didn't
                                          find such a feature of clang's
                                          static analyzer. Is this<br
                                            class="">
                                          > > the case?<br
                                            class="">
                                          ><br class="">
                                          > copy-paste detector? As
                                          in plagarism detection?<br
                                            class="">
                                          <br class="">
                                        </div>
                                        I don't think plagiarism is the
                                        concern.  The conern is that<br
                                          class="">
                                        copy/paste of blocks of code
                                        where the pasted block needs to
                                        be<br class="">
                                        updated in several places, but
                                        not all of the updates were
                                        performed.<br class="">
                                      </blockquote>
                                      <div class=""><br class="">
                                      </div>
                                      <div class="">I've implemented
                                        this sort of thing, but it's
                                        only 80% finished and has been
                                        kicking around on the
                                        low-priority end of my todo list
                                        for the past couple of years.
                                        Patch attached. It'd be great if
                                        someone were interested in
                                        finishing this off. I won't get
                                        to it soon.</div>
                                      <div class=""><br class="">
                                      </div>
                                      <div class="">Note that it's a
                                        warning instead of a static
                                        analysis check which means that
                                        it must have an aggressively low
                                        number of false positives, and
                                        that it must be run quickly. The
                                        implementation I have analyzes
                                        conditional operators and
                                        if/elseif chains, but doesn't
                                        collect all the expressions
                                        through something like a
                                        && b &&c
                                        && a. That would be the
                                        next thing to add.</div>
                                      <div class=""><br class="">
                                      </div>
                                      <div class="">It does have some
                                        really cool properties that we
                                        can only get because clang
                                        integrates closely with its
                                        preprocessor. Consider this
                                        sample from the testcase:</div>
                                      <div class=""><br class="">
                                        #define num_cpus() (1)<br
                                          class="">
                                        #define max_omp_threads() (1)<br
                                          class="">
                                        int test8(int expr) {<br
                                          class="">
                                          if (expr) {<br class="">
                                            return num_cpus();<br
                                          class="">
                                          } else {<br class="">
                                            return max_omp_threads();<br
                                          class="">
                                          }<br class="">
                                        }</div>
                                      <div class=""><br class="">
                                      </div>
                                      <div class="">We know better than
                                        to warn on that, even though the
                                        AST looks the same. If you
                                        instead write "return
                                        num_cpus();" twice, we warn on
                                        that (that's test9 in the
                                        testsuite).</div>
                                      <div class=""><br class="">
                                      </div>
                                      <div class="">Nick</div>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            Thanks this looks very interesting. This may
                            be a good start for a student. IIUC a
                            non-unique expr is the ones that have same
                            source ranges and same FileIDs, right? Could
                            this be upgraded to AST-node (structural)
                            comparison?</div>
                        </blockquote>
                        <div class=""><br class="">
                        </div>
                        <div class="">It is an AST-node comparison. In
                          order to handle the case of different macros,
                          we ask the AST nodes what their SourceLocation
                          was, and factor in the macroid, if there was
                          one. A large part of the patch is a change to
                          the Stmt::profile logic to look at all the
                          sourcelocations in all the possible AST nodes.</div>
                        <div class=""> </div>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000" class=""><span
                              class="HOEnZb"><font class=""
                                color="#888888"><br class="">
                                Vassil</font></span>
                            <div class="im"><br class="">
                              <blockquote type="cite" class="">
                                <div dir="ltr" class="">
                                  <div class="gmail_extra">
                                    <div class="gmail_quote">
                                      <div class=""><br class="">
                                      </div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Coverity


                                        can detect such instances, for
                                        instance.<br class="">
                                        <br class="">
                                        Here is an article from 2006
                                        describing such a tool:<br
                                          class="">
                                        <<a moz-do-not-send="true"
                                          href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113"
                                          target="_blank" class="">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113</a>><br
                                          class="">
                                        <br class="">
                                        Wikipedia says PMD has a
                                        copy/paste detector that works
                                        with C++:<br class="">
                                        <<a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29"
                                          target="_blank" class="">http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29</a>><br
                                          class="">
                                        <br class="">
                                        "Note that CPD works with Java,
                                        JSP, C, C++, C#, Fortran and PHP
                                        code.<br class="">
                                        Your own language is missing ?
                                        See how to add it here"<br
                                          class="">
                                        <<a moz-do-not-send="true"
                                          href="http://pmd.sourceforge.net/snapshot/cpd-usage.html"
                                          target="_blank" class="">http://pmd.sourceforge.net/snapshot/cpd-usage.html</a>><br
                                          class="">
                                        <span class=""><font class=""
                                            color="#888888">--<br
                                              class="">
                                            "The Direct3D Graphics
                                            Pipeline" free book <<a
                                              moz-do-not-send="true"
                                              href="http://tinyurl.com/d3d-pipeline"
                                              target="_blank" class="">http://tinyurl.com/d3d-pipeline</a>><br
                                              class="">
                                                 The Computer Graphics
                                            Museum <<a
                                              moz-do-not-send="true"
                                              href="http://computergraphicsmuseum.org/"
                                              target="_blank" class="">http://ComputerGraphicsMuseum.org</a>><br
                                              class="">
                                                     The Terminals Wiki
                                            <<a
                                              moz-do-not-send="true"
                                              href="http://terminals.classiccmp.org/"
                                              target="_blank" class="">http://terminals.classiccmp.org</a>><br
                                              class="">
                                              Legalize Adulthood! (my
                                            blog) <<a
                                              moz-do-not-send="true"
                                              href="http://legalizeadulthood.wordpress.com/"
                                              target="_blank" class="">http://LegalizeAdulthood.wordpress.com</a>><br
                                              class="">
                                          </font></span>
                                        <div class="">
                                          <div class="">_______________________________________________<br
                                              class="">
                                            cfe-dev mailing list<br
                                              class="">
                                            <a moz-do-not-send="true"
                                              href="mailto:cfe-dev@cs.uiuc.edu"
                                              target="_blank" class="">cfe-dev@cs.uiuc.edu</a><br
                                              class="">
                                            <a moz-do-not-send="true"
                                              href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"
                                              target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br
                                              class="">
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
                                    <br class="">
                                  </div>
                                </div>
                                <br class="">
                                <fieldset class=""></fieldset>
                                <br class="">
                                <pre class="">_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank" class="">cfe-dev@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
                              </blockquote>
                              <br class="">
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br class="">
                    </div>
                  </div>
                </blockquote>
                <br class="">
                <br class="">
                <pre class="moz-signature" cols="72">-- 
--------------------------------------------
Q: Why is this email five sentences or less?
A: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://five.sentenc.es/">http://five.sentenc.es</a>
</pre>
              </div>
              _______________________________________________<br
                class="">
              cfe-dev mailing list<br class="">
              <a moz-do-not-send="true"
                href="mailto:cfe-dev@cs.uiuc.edu" class="">cfe-dev@cs.uiuc.edu</a><br
                class="">
              <a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br
                class="">
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>