<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 24/02/15 06:15, Anna Zaks wrote:<br>
    </div>
    <blockquote
      cite="mid:95677D8A-FCEF-4A75-AA28-52D01DCF35C6@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Feb 18, 2015, at 2:50 AM, Vassil Vassilev
            <<a moz-do-not-send="true" href="mailto:vvasilev@cern.ch"
              class="">vvasilev@cern.ch</a>> wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <div class="moz-cite-prefix">That's great! What would be
                the next steps? Do you know who will be the GSoC org
                admin? </div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        There was an email sent about GCoC a couple of days ago to the
        LLVMDev list.</div>
    </blockquote>
    Thanks for the information. I addressed all of your comments and
    sent a patch to OpenProjects.html, cc-ing also you, Anna, for a
    review.<br>
    Many thanks,<br>
    Vassil<br>
    <blockquote
      cite="mid:95677D8A-FCEF-4A75-AA28-52D01DCF35C6@apple.com"
      type="cite">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <div class="moz-cite-prefix">Do you think we should
                improve the project description</div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>I think adding specific examples that we want to handle
          would be useful in scoping this down.</div>
      </div>
    </blockquote>
    <blockquote
      cite="mid:95677D8A-FCEF-4A75-AA28-52D01DCF35C6@apple.com"
      type="cite">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <div class="moz-cite-prefix"> and nominate a backup
                mentor?<br class="">
                Vassil<br class="">
                On 17/02/15 20:05, Anna Zaks wrote:<br class="">
              </div>
              <blockquote
                cite="mid:DEA2B2DD-85C9-4BE2-A37C-775EC94FCD7C@apple.com"
                type="cite" class="">
                <div class="">This would be a very useful feature to
                  have in the clang static analyzer and can be scoped
                  for a GSoC project!</div>
                <div class=""><br class="">
                </div>
                <div class="">Anna.</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <div class="">
                    <blockquote type="cite" class="">
                      <div class="">On Feb 10, 2015, at 4:06 AM, Vassil
                        Vassilev <<a moz-do-not-send="true"
                          href="mailto:vvasilev@cern.ch" class="">vvasilev@cern.ch</a>>

                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <div text="#000000" bgcolor="#FFFFFF" class="">
                          <div class="moz-cite-prefix">Hi all,<br
                              class="">
                              I just wanted to bump this up (given GSoC
                            is starting). I didn't manage to get a good
                            student for this project (proposal is below)
                            last year :(. I thought maybe if we went
                            through the LLVM mentoring organization
                            would be better. Do you think this would
                            make a good GSoC project from Clang's
                            perspective? I'd be happy to update the
                            proposal to make it more attractive or
                            general-purpose.<br class="">
                            Vassil<br class="">
                            <br class="">
                            <h3 class="">Code copy/paste detection</h3>
                            <div class=""><strong class="">Description</strong>:The

                              copy/paste is common programming practice.
                              Most of the programmers start from a code
                              snippet that already exists in the system
                              and modify it to match their needs. Easily
                              some of the code snippets end up being
                              copied dozens of times, which leads to
                              worse maintainability, understandability
                              and logical design. <a
                                moz-do-not-send="true" class="ext"
                                href="http://clang.llvm.org/">Clang<span
                                  class="ext"><span
                                    class="element-invisible"> (link is
                                    external)</span></span></a> and <a
                                moz-do-not-send="true" class="ext"
                                href="http://http//clang-analyzer.llvm.org/">clang's

                                static analyzer<span class="ext"><span
                                    class="element-invisible"> (link is
                                    external)</span></span></a> provide
                              all the building blocks to build a generic
                              C/C++ copy/paste detector.</div>
                            <div class=""><strong class="">Expected
                                results</strong>:Build a standalone tool
                              or clang plugin being able to detect
                              copy/pasted code.</div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </blockquote>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>I think having this integrated into one of the existing
          clang tools should the be the goal. For example, the static
          analyzer is a good fit. The static analyzer does not have
          plugins.</div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <blockquote
                cite="mid:DEA2B2DD-85C9-4BE2-A37C-775EC94FCD7C@apple.com"
                type="cite" class="">
                <div class="">
                  <div class="">
                    <blockquote type="cite" class="">
                      <div class="">
                        <div text="#000000" bgcolor="#FFFFFF" class="">
                          <div class="moz-cite-prefix">
                            <div class=""> Lay the foundations of
                              detection of slightly modified code
                              (semantic analysis required). Implement
                              tests for all the realized functionality.
                              Prepare a final poster of the work and be
                              ready to present it.</div>
                            <div class=""><strong class="">Required
                                knowledge</strong>: Advanced C++, Basic
                              knowledge of Clang/Clang Static Analyzer.</div>
                            <p class=""><strong class="">Mentor</strong>:
                              Vassil Vassilev/ maybe somebody else as
                              second mentor?<a moz-do-not-send="true"
                                class="mailto"
href="mailto:sft-gsoc-AT-cern-dot-ch?subject=GSoC%202014%20Extending%20Cling"><span
                                  class="mailto"><br class="">
                                </span></a></p>
                            <br class="">
                            On 07/02/14 22:20, Nick Lewycky wrote:<br
                              class="">
                          </div>
                          <blockquote
cite="mid:CADbEz-hdxzO6VFrRPewungnLxAPKZ7po1C07r5STaeV8z_+qpg@mail.gmail.com"
                            type="cite" class="">
                            <div dir="ltr" class="">
                              <div class="gmail_extra">
                                <div class="gmail_quote">On 7 February
                                  2014 04:49, Vassil Vassilev <span
                                    dir="ltr" class=""><<a
                                      moz-do-not-send="true"
                                      href="mailto:vvasilev@cern.ch"
                                      target="_blank" class="">vvasilev@cern.ch</a>></span>
                                  wrote:<br class="">
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div bgcolor="#FFFFFF"
                                      text="#000000" class="">
                                      <div class="im">
                                        <div class="">On 05/02/14 21:32,
                                          Nick Lewycky wrote:<br
                                            class="">
                                        </div>
                                        <blockquote type="cite" class="">
                                          <div dir="ltr" class="">
                                            <div class="gmail_extra">
                                              <div class="gmail_quote">On
                                                3 February 2014 14:08,
                                                Richard <span dir="ltr"
                                                  class=""><<a
                                                    moz-do-not-send="true"
href="mailto:legalize@xmission.com" target="_blank" class="">legalize@xmission.com</a>></span>
                                                wrote:<br class="">
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0px 0px
                                                  0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br
                                                    class="">
                                                  In article <<a
                                                    moz-do-not-send="true"
href="mailto:CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com"
                                                    target="_blank"
                                                    class="">CAENS6EsgzhXWfANFze8VAp68qDGHnrHNZJaaLmi28YJtnQwOmw@mail.gmail.com</a>>,<br
                                                    class="">
                                                  <div class="">   
                                                    David Blaikie <<a
moz-do-not-send="true" href="mailto:dblaikie@gmail.com" target="_blank"
                                                      class="">dblaikie@gmail.com</a>>



                                                    writes:<br class="">
                                                    <br class="">
                                                    > On Mon, Feb 3,
                                                    2014 at 3:06 AM,
                                                    Vassil Vassilev <<a
moz-do-not-send="true" href="mailto:vvasilev@cern.ch" target="_blank"
                                                      class="">vvasilev@cern.ch</a>>


                                                    wrote:<br class="">
                                                    ><br class="">
                                                  </div>
                                                  <div class="">>
                                                    >   A few months
                                                    ago I was looking
                                                    for a copy-paste
                                                    detector for a C++<br
                                                      class="">
                                                    > > project. I
                                                    didn't find such a
                                                    feature of clang's
                                                    static analyzer. Is
                                                    this<br class="">
                                                    > > the case?<br
                                                      class="">
                                                    ><br class="">
                                                    > copy-paste
                                                    detector? As in
                                                    plagarism detection?<br
                                                      class="">
                                                    <br class="">
                                                  </div>
                                                  I don't think
                                                  plagiarism is the
                                                  concern.  The conern
                                                  is that<br class="">
                                                  copy/paste of blocks
                                                  of code where the
                                                  pasted block needs to
                                                  be<br class="">
                                                  updated in several
                                                  places, but not all of
                                                  the updates were
                                                  performed.<br class="">
                                                </blockquote>
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <div class="">I've
                                                  implemented this sort
                                                  of thing, but it's
                                                  only 80% finished and
                                                  has been kicking
                                                  around on the
                                                  low-priority end of my
                                                  todo list for the past
                                                  couple of years. Patch
                                                  attached. It'd be
                                                  great if someone were
                                                  interested in
                                                  finishing this off. I
                                                  won't get to it soon.</div>
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <div class="">Note that
                                                  it's a warning instead
                                                  of a static analysis
                                                  check which means that
                                                  it must have an
                                                  aggressively low
                                                  number of false
                                                  positives, and that it
                                                  must be run quickly.
                                                  The implementation I
                                                  have analyzes
                                                  conditional operators
                                                  and if/elseif chains,
                                                  but doesn't collect
                                                  all the expressions
                                                  through something like
                                                  a && b
                                                  &&c &&
                                                  a. That would be the
                                                  next thing to add.</div>
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <div class="">It does
                                                  have some really cool
                                                  properties that we can
                                                  only get because clang
                                                  integrates closely
                                                  with its preprocessor.
                                                  Consider this sample
                                                  from the testcase:</div>
                                                <div class=""><br
                                                    class="">
                                                  #define num_cpus() (1)<br
                                                    class="">
                                                  #define
                                                  max_omp_threads() (1)<br
                                                    class="">
                                                  int test8(int expr) {<br
                                                    class="">
                                                    if (expr) {<br
                                                    class="">
                                                      return num_cpus();<br
                                                    class="">
                                                    } else {<br class="">
                                                      return
                                                  max_omp_threads();<br
                                                    class="">
                                                    }<br class="">
                                                  }</div>
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <div class="">We know
                                                  better than to warn on
                                                  that, even though the
                                                  AST looks the same. If
                                                  you instead write
                                                  "return num_cpus();"
                                                  twice, we warn on that
                                                  (that's test9 in the
                                                  testsuite).</div>
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <div class="">Nick</div>
                                              </div>
                                            </div>
                                          </div>
                                        </blockquote>
                                      </div>
                                      Thanks this looks very
                                      interesting. This may be a good
                                      start for a student. IIUC a
                                      non-unique expr is the ones that
                                      have same source ranges and same
                                      FileIDs, right? Could this be
                                      upgraded to AST-node (structural)
                                      comparison?</div>
                                  </blockquote>
                                  <div class=""><br class="">
                                  </div>
                                  <div class="">It is an AST-node
                                    comparison. In order to handle the
                                    case of different macros, we ask the
                                    AST nodes what their SourceLocation
                                    was, and factor in the macroid, if
                                    there was one. A large part of the
                                    patch is a change to the
                                    Stmt::profile logic to look at all
                                    the sourcelocations in all the
                                    possible AST nodes.</div>
                                  <div class=""> </div>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div bgcolor="#FFFFFF"
                                      text="#000000" class=""><span
                                        class="HOEnZb"><font class=""
                                          color="#888888"><br class="">
                                          Vassil</font></span>
                                      <div class="im"><br class="">
                                        <blockquote type="cite" class="">
                                          <div dir="ltr" class="">
                                            <div class="gmail_extra">
                                              <div class="gmail_quote">
                                                <div class=""><br
                                                    class="">
                                                </div>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0px 0px
                                                  0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Coverity



                                                  can detect such
                                                  instances, for
                                                  instance.<br class="">
                                                  <br class="">
                                                  Here is an article
                                                  from 2006 describing
                                                  such a tool:<br
                                                    class="">
                                                  <<a
                                                    moz-do-not-send="true"
href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113"
                                                    target="_blank"
                                                    class="">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.113</a>><br
                                                    class="">
                                                  <br class="">
                                                  Wikipedia says PMD has
                                                  a copy/paste detector
                                                  that works with C++:<br
                                                    class="">
                                                  <<a
                                                    moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29"
                                                    target="_blank"
                                                    class="">http://en.wikipedia.org/wiki/PMD_(software)#Copy.2FPaste_Detector_.28CPD.29</a>><br
                                                    class="">
                                                  <br class="">
                                                  "Note that CPD works
                                                  with Java, JSP, C,
                                                  C++, C#, Fortran and
                                                  PHP code.<br class="">
                                                  Your own language is
                                                  missing ? See how to
                                                  add it here"<br
                                                    class="">
                                                  <<a
                                                    moz-do-not-send="true"
href="http://pmd.sourceforge.net/snapshot/cpd-usage.html"
                                                    target="_blank"
                                                    class="">http://pmd.sourceforge.net/snapshot/cpd-usage.html</a>><br
                                                    class="">
                                                  <span class=""><font
                                                      class=""
                                                      color="#888888">--<br
                                                        class="">
                                                      "The Direct3D
                                                      Graphics Pipeline"
                                                      free book <<a
                                                        moz-do-not-send="true"
href="http://tinyurl.com/d3d-pipeline" target="_blank" class="">http://tinyurl.com/d3d-pipeline</a>><br
                                                        class="">
                                                           The Computer
                                                      Graphics Museum
                                                      <<a
                                                        moz-do-not-send="true"
href="http://computergraphicsmuseum.org/" target="_blank" class="">http://ComputerGraphicsMuseum.org</a>><br
                                                        class="">
                                                               The
                                                      Terminals Wiki
                                                      <<a
                                                        moz-do-not-send="true"
href="http://terminals.classiccmp.org/" target="_blank" class="">http://terminals.classiccmp.org</a>><br
                                                        class="">
                                                        Legalize
                                                      Adulthood! (my
                                                      blog) <<a
                                                        moz-do-not-send="true"
href="http://legalizeadulthood.wordpress.com/" target="_blank" class="">http://LegalizeAdulthood.wordpress.com</a>><br
                                                        class="">
                                                    </font></span>
                                                  <div class="">
                                                    <div class="">_______________________________________________<br
                                                        class="">
                                                      cfe-dev mailing
                                                      list<br class="">
                                                      <a
                                                        moz-do-not-send="true"
href="mailto:cfe-dev@cs.uiuc.edu" target="_blank" class="">cfe-dev@cs.uiuc.edu</a><br
                                                        class="">
                                                      <a
                                                        moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank"
                                                        class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br
                                                        class="">
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br class="">
                                            </div>
                                          </div>
                                          <br class="">
                                          <fieldset class=""></fieldset>
                                          <br class="">
                                          <pre class="">_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu" target="_blank" class="">cfe-dev@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
                                        </blockquote>
                                        <br class="">
                                      </div>
                                    </div>
                                  </blockquote>
                                </div>
                                <br class="">
                              </div>
                            </div>
                          </blockquote>
                          <br class="">
                          <br class="">
                          <pre class="moz-signature" cols="72">-- 
--------------------------------------------
Q: Why is this email five sentences or less?
A: <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://five.sentenc.es/">http://five.sentenc.es</a>
</pre>
                        </div>
                        _______________________________________________<br
                          class="">
                        cfe-dev mailing list<br class="">
                        <a moz-do-not-send="true"
                          href="mailto:cfe-dev@cs.uiuc.edu" class="">cfe-dev@cs.uiuc.edu</a><br
                          class="">
                        <a moz-do-not-send="true"
                          class="moz-txt-link-freetext"
                          href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br
                          class="">
                      </div>
                    </blockquote>
                  </div>
                </div>
              </blockquote>
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <br>
    <br>
  </body>
</html>