<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 30/03/17 17:03, Mehdi Amini wrote:<br>
    </div>
    <blockquote
      cite="mid:25EAE982-85ED-4D02-8987-B34F53041CF3@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Mar 30, 2017, at 6:56 AM, Vassil Vassilev
            <<a moz-do-not-send="true"
              href="mailto:v.g.vassilev@gmail.com" class="">v.g.vassilev@gmail.com</a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta content="text/html; charset=utf-8"
              http-equiv="Content-Type" class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <div class="moz-cite-prefix">Hi,<br class="">
                <br class="">
                  This seems a very exciting project.<br class="">
              </div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>Do I take that you’re volunteering to mentor it? ;-)</div>
      </div>
    </blockquote>
    Not really ;) We would be happy to provide help reviewing patches if
    you choose to work with the clone detection infrastructure.<br>
    <blockquote
      cite="mid:25EAE982-85ED-4D02-8987-B34F53041CF3@apple.com"
      type="cite">
      <div>
        <div><br class="">
        </div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <div class="moz-cite-prefix"> <br class="">
                  As part of GSoC16 Raphael developed a code clone
                detection tool
                (<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://docs.google.com/presentation/d/1mJ6dA6XmAQ8s8Zqm_j518yoW-_QZ72e69fPG1u_nbj8/edit#slide=id.g35f391192_00">https://docs.google.com/presentation/d/1mJ6dA6XmAQ8s8Zqm_j518yoW-_QZ72e69fPG1u_nbj8/edit#slide=id.g35f391192_00</a>).
                We are working on turning the infrastructure into a
                reusable set of components (<a moz-do-not-send="true"
                  class="moz-txt-link-freetext"
                  href="https://reviews.llvm.org/D23418">https://reviews.llvm.org/D23418</a>).<br
                  class="">
                <br class="">
                  Raphael hacked together a few lines of code,
                addressing Greg's proposal based on D23418. <br
                  class="">
                <br class="">
                  <br class="">
                <table class="" border="1" cellpadding="2"
                  cellspacing="2" width="100%">
                  <tbody class="">
                    <tr class="">
                      <td class="" valign="top">r1</td>
                      <td class="" valign="top">r2<br class="">
                      </td>
                    </tr>
                    <tr class="">
                      <td class="" valign="top">
                        <meta http-equiv="content-type"
                          content="text/html; charset=utf-8" class="">
                        <div class="de1"><span class="st0"><span
                              class="st0">int main(int argc, const char
                              **argv) {<span class="es1"></span><br
                                class="">
                            </span>  switch (argc) {<span class="es1"></span></span>
                          <div class="de1"><span class="st0">  }<span
                                class="es1"></span></span></div>
                          <div class="de1"><span class="st0">  if (argc
                              > 2) {<span class="es1"></span></span></div>
                          <div class="de2"><span class="st0">    return
                              1;<span class="es1"></span></span></div>
                          <div class="de1"><span class="st0">  }<span
                                class="es1"></span></span></div>
                          <div class="de1"><span class="st0">  while
                              (false);<span class="es1"></span></span></div>
                          <div class="de1"><span class="st0">  int
                              funkyVariable = 1;</span></div>
                          <div class="de1"><span class="st0"> 
                              funkyVariable++;<span class="es1"></span></span></div>
                          <span class="st0">}</span><span class="st0"><span
                              class="st0"></span></span></div>
                      </td>
                      <td class="" valign="top"><span class="st0"><span
                            class="st0">int main(int argc, const char
                            **argv) {<span class="es1"></span><br
                              class="">
                          </span></span>
                        <div class="de1"><span class="st0">  if (argc
                            > 2) {<span class="es1"></span></span></div>
                        <div class="de2"><span class="st0">    return 1;<span
                              class="es1"></span></span></div>
                        <div class="de1"><span class="st0">  }<span
                              class="es1"></span></span></div>
                        <span class="st0">  switch (argc) {<span
                            class="es1"></span></span>
                        <div class="de1"><span class="st0">  }<span
                              class="es1"></span></span></div>
                        <div class="de1"><span class="st0">  while
                            (false);<span class="es1"></span></span></div>
                        <div class="de1"><span class="st0">  int
                            funkyVariable = 1;</span></div>
                        <div class="de1"><span class="st0"> 
                            funkyVariable++;<span class="es1"></span></span></div>
                        <span class="st0">}</span></td>
                    </tr>
                  </tbody>
                </table>
                <br class="">
                  ./clangDiff<br class="">
                  Change: SwitchStmt moved from line 2 to line 5<br
                  class="">
                  Change: IfStmt moved from line 4 to line 2<br class="">
                 <br class="">
              </div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>Neat! :)</div>
        <div><br class="">
        </div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <div class="moz-cite-prefix"> <br class="">
                  Here is how it looks <a moz-do-not-send="true"
                  class="moz-txt-link-freetext"
href="https://gist.github.com/Teemperor/b252bae4b2544f57d6bb9580a0e890e4">https://gist.github.com/Teemperor/b252bae4b2544f57d6bb9580a0e890e4</a><br
                  class="">
                <br class="">
                  Let us know if we can help you further with this!<br
                  class="">
              </div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>I’d be happy if you could take the lead. Johannes asked
          earlier how to start in clang and show his ability, any bug to
          fix or small improvement to implement you can suggest?</div>
      </div>
    </blockquote>
    <br>
    I am afraid I cannot dedicate a lot of time for this. I believe
    these are not difficult to fix:<br>
    Documentation:<br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=16106">https://bugs.llvm.org/show_bug.cgi?id=16106</a><br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=19260">https://bugs.llvm.org/show_bug.cgi?id=19260</a><br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=5935">https://bugs.llvm.org/show_bug.cgi?id=5935</a><br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=10257">https://bugs.llvm.org/show_bug.cgi?id=10257</a><br>
    <br>
    C++<br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=24883">https://bugs.llvm.org/show_bug.cgi?id=24883</a><br>
      * <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=27532">https://bugs.llvm.org/show_bug.cgi?id=27532</a> (tricky)<br>
    <br>
    <blockquote
      cite="mid:25EAE982-85ED-4D02-8987-B34F53041CF3@apple.com"
      type="cite">
      <div>
        <div>He also asked about libclang vs libtooling, not sure if
          anyone already answered.</div>
      </div>
    </blockquote>
    :(<br>
    <br>
    -- Vassil<br>
    <blockquote
      cite="mid:25EAE982-85ED-4D02-8987-B34F53041CF3@apple.com"
      type="cite">
      <div>
        <div><br class="">
        </div>
        <div>— </div>
        <div>Mehdi</div>
        <div><br class="">
        </div>
        <div><br class="">
        </div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div bgcolor="#FFFFFF" text="#000000" class="">
              <div class="moz-cite-prefix"> <br class="">
                -- Vassil and Raphael<br class="">
                <br class="">
                On 23/03/17 18:41, Greg Clayton via llvm-dev wrote:<br
                  class="">
              </div>
              <blockquote
                cite="mid:C096D49D-7D2C-4189-8682-35A818B4FA8C@gmail.com"
                type="cite" class="">
                <pre class="" wrap="">My original idea was to write a semantic diff tool that just does some simple things up front:

create an MD5 from all top level blocks of the code. Start by just finding matching blocks of code ('{' and '}', '(' and ')') and remember the source locations for these and their MD5 values. Run a normal diff on the code and see what blocks the diffs fall into. Then try to figure out where things moved by possibly delving deeper into each block that matched something from the diff. Also if any blocks moved to a completely different location, try and figure that out by matching the MD5 of any blocks.

For example if you had:

int main(int argc, const char **argv) {
  if (argc > 2) {
  }
  switch (argc) {
  }


You would first make MD5s for the '(' and ')' in the "main" line and for the '{' at the end of the main line, and ending at the end of the code. Now the code looks like:

int main(int argc, const char **argv) {
  switch (argc) {
  }
  if (argc > 2) {
  }


The diff would show that the "if" is gone and a new "if" is found after the switch in the new version of the file. We would notice that the diff appears inside the block from the first:

{
  if (argc > 2) {
  }
  switch (argc) {
  }


And in the block from the second:

{
  switch (argc) {
  }
  if (argc > 2) {
  }
}

So we would then compute the MD5 for the blocks inside each of these blocks and try to match things up. The MD5 would of course remove spaces that aren't in strings and only compute the MD5 from the characters that make sense. This simple type of approach could almost work on any language without the need to be able to correctly compile each file with all the right options.

Greg

</pre>
                <blockquote type="cite" class="">
                  <pre class="" wrap="">On Mar 20, 2017, at 4:47 PM, Mehdi Amini <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:mehdi.amini@apple.com"><mehdi.amini@apple.com></a> wrote:

(+CC: Greg Clayton who gave me this idea in the first place)

</pre>
                  <blockquote type="cite" class="">
                    <pre class="" wrap="">On Mar 20, 2017, at 3:20 PM, Johannes Altmanninger via llvm-dev <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a> wrote:

Hello,

I am currently studying Computer Science at TU Eindhoven. I am doing a
course that involves programming assignments on parts of LLVM such as
lowering, scheduling and optimization. For this year's Google Summer of
Code I plan to submit a proposal to implement a clang-based diff tool
[1].
</pre>
                  </blockquote>
                  <pre class="" wrap="">Great! I look forward to see this :)

</pre>
                  <blockquote type="cite" class="">
                    <pre class="" wrap="">I think it really pays off to have decent developer tools available, as
they can save tons of time. Clang tooling has obviously been very
successful.  I think it would be a good idea to develop a diff tool that
considers the structure of the code, as opposed to just the lines. Plain
old diff only thinks in terms of "additions" and "deletions", although
it would be more natural to also consider "updates" and "moves".

So a structural diff would work solely on the AST, hence formatting
changes are ignored. It would allow to highlight the exact location of a
change, and not a whole line. Furthermore, it would allow to compare
pieces of code with the same structure (think subclasses).

Besides some papers with clever AST-matching algorithms, a quick web
search yielded [2], which is a proof-of-concept implementation of a
structural comparison algorithm.  I think it demonstrates rather nicely
what could be done: movement of chunks of code can be easily traced.

Anyway, one could make all kinds of nice visualizations using a AST diff
tool, however, I think the initial focus should probably be on creating
one with a similar output to traditional diff, with the difference that
updates and moves are displayed in a easily readable way, which already
could improve developer productivity and happiness.

As of now I have one question: The output of the tool is meant just for
humans to read (and not for actual patching), right?
</pre>
                  </blockquote>
                  <pre class="" wrap="">Yes. But we developed software as libraries usually. Practically I expect the main part of the work to write some piece of API that generate an “in-memory” representation of the diff.

A tool that is generating a textual-human readable output is likely the first client of this API and is likely critical to be able to functionally test it in the early development. In the future I hope it’d enable other graphical diff client to plug-in, or git-merge resolution tools as well.

Best,

— 
Mehdi

</pre>
                </blockquote>
                <pre class="" wrap="">_______________________________________________
LLVM Developers mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
              </blockquote>
              <p class=""><br class="">
              </p>
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <p><br>
    </p>
  </body>
</html>