[llvm-dev] [GSoC 2017] Clang-based diff tool project

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 20 16:11:47 PDT 2017


On 03/20/2017 05:20 PM, Johannes Altmanninger via llvm-dev wrote:
> Hello,
>
> I am currently studying Computer Science at TU Eindhoven. I am doing a
> course that involves programming assignments on parts of LLVM such as
> lowering, scheduling and optimization. For this year's Google Summer of
> Code I plan to submit a proposal to implement a clang-based diff tool
> [1].
>
> I think it really pays off to have decent developer tools available, as
> they can save tons of time. Clang tooling has obviously been very
> successful.  I think it would be a good idea to develop a diff tool that
> considers the structure of the code, as opposed to just the lines. Plain
> old diff only thinks in terms of "additions" and "deletions", although
> it would be more natural to also consider "updates" and "moves".
>
> So a structural diff would work solely on the AST, hence formatting
> changes are ignored. It would allow to highlight the exact location of a
> change, and not a whole line. Furthermore, it would allow to compare
> pieces of code with the same structure (think subclasses).
>
> Besides some papers with clever AST-matching algorithms, a quick web
> search yielded [2], which is a proof-of-concept implementation of a
> structural comparison algorithm.  I think it demonstrates rather nicely
> what could be done: movement of chunks of code can be easily traced.

There is also a fair amount of literature associated with "XML Diff" 
tools which also demonstrate this kind of structural comparison. For 
example, see:

   http://diffxml.sourceforge.net/
   https://www.cs.hut.fi/~ctl/3dm/
   http://pages.cs.wisc.edu/~yuanwang/xdiff.html

>
> Anyway, one could make all kinds of nice visualizations using a AST diff
> tool, however, I think the initial focus should probably be on creating
> one with a similar output to traditional diff, with the difference that
> updates and moves are displayed in a easily readable way, which already
> could improve developer productivity and happiness.
>
> As of now I have one question: The output of the tool is meant just for
> humans to read (and not for actual patching), right?
>
> To sum up, this could be a very interesting project for me to work on,
> and the result will hopefully be useful to a wide range of developers. I
> would appreciate any feedback. Also, suggestions on how the diff output
> should be presented are welcome. Thank you!

In the long term, I'd love to see a semantic diff tool which could help 
resolve merge conflicts. Merging two branches, both which have added a 
function to a class, or a member to a class (including updating the 
constructors), is a common problem. Having some way to "apply both 
changes" automatically would be a big help. To that end, I'd hope that 
the output of the tool could indeed be used for actual patching. That 
does not mean it would need to follow the traditional diff format (in 
fact, I'd expect that it would not). Moreover, the 'patch' part of the 
tool might well be out-of-scope for the initial project. I do think, 
however, we should at least have a mode where the 'diff' output is 
precise and machine readable so that we might later design patching tools.

  -Hal

>
> Johannes
>
>
> [1] http://llvm.org/OpenProjects.html#clang-diff-tool
> [2] https://yinwang0.wordpress.com/2012/01/03/ydiff/
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list