[cfe-dev] EuroLLVM roundtable notes: Future Refactoring with AST-Matchers

Sat May 4 07:12:39 PDT 2019

Hi,

We had a roundtable in Brussels about AST-Matchers. Unfortunately I lost
my notes that I took there. Here are some notes drawn from memory. There
were about 6 participants who can expand on this if they have more
information.

  https://eurollvm2019.sched.com/event/MZH6/round-tables

1)

We talked about some of the features and ui tool that I was demoing in
my talk:

  https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching

I strongly recommend watching my EuroLLVM talk if interested in this topic.

We talked about the features of matcher and source location discovery as
well as AST matcher debugging. Those features assist in creating
new/bespoke clang-tidy checks.

The consensus was that such features and tools should be enabled by
upstreaming changes to clang. I have some patches up already:

  https://reviews.llvm.org/search/query/rkXq09eyETfd/#R

The Qt UI is not something I would try to upstream to a llvm-project
repo. Really, the good outcome would be for existing IDEs to get these
features, because you don't want to use a simple IDE that I made. My
patches put all of the features in a ui-independent way into clang, so
that should be fine.

I'll put the ui on github though once enough patches are upstreamed to
make it possible to build and use without patching clang.

2) Ignoring implicit nodes

I'm currently working on what I think is the most important change,
which is to make it possible to ignore invisible nodes when matching on
the AST (and when dumping the AST). I mostly talked about this
separately with Manuel. We think it can work with a few adjustments.

See the difference here with all of the implicit nodes inserted to
handle conversions:

  http://ce.steveire.com/z/lHYwEH

The already-existing ignoringImplicit() matcher is not sufficient
because

* That matcher does not ignore implicit CXXMemberCallExpr etc
* It is inconvenient to put a matcher between each two expressions
* A novice should be able to write a simple matcher which handles the
'difficult' cases without the novice having to know that those cases
exist and how to handle them.

For completeness, here is the matcher you have to write today to match
those cases:

  http://ce.steveire.com/z/PwzkwW

Having a mode of ignoring invisible nodes is more-novice-friendly,
especially if it is the default.

3) Indexing

With some form of the features I showed in the presentations, the
workflow of creating a bespoke tool for a one-way refactoring gets
easier, more discoverable, quicker iteration etc.

However, one of the remaining issues is that we still have to run the
tool on all translation units in the compile database, even if we are
only changing one functionDecl and callExprs which callee() it.

You can see in the demo in the presentation that I filter files by name
to run the javascript refactoring tool, so that I didn't have to run it
on all files in the llvm build:

  https://youtu.be/yqi8U8Q0h2g?t=1263

It would be a great to have some tool which, given a declaration, would
return a list of translation units/compile_db entries which use that
declaration.

Something like an indexer could be used, so we talked about that a bit.

3) Distributed refactoring

We also talked about distributed refactoring as a means of generating
refactoring output quickly. I demo'ed the use of Icecream for this at
code::dive a few months ago:

  https://www.youtube.com/watch?v=_T-5pWQVxeE&feature=youtu.be&t=2610

We talked about the possibility of using other distribution tooling to
achieve this.

Goma was discussed for a bit. I'm not sure, but it might be what google
use to do that kind of distributed refactoring. At the time the server
part of goma was not open source, but now that is not the case

  https://chromium.googlesource.com/infra/goma/server/

meaning it might now be a way to facilitate distributed refactoring tasks.

4) Transformer library and ASTER

We also talked about the transformer library, which is getting into
clang already, and about ASTER. I demo'd both to give an idea.

The transformer library enables creating node-based transformations
without needing to use a rewriter directly with source locations etc.

The ASTER library aims to generate AST matchers from before/after states
of the code. I don't know its state of development.

5) Syntax Tree

We also spoke a bit about Syntax Tree, a new proposal which aims to
enable refactoring based on an AST which is really about Syntax and is
really a Tree, instead of requiring developers to output new source code.

Part of this seems to be a response to the same problem I try to solve
by ignoring invisible nodes in the current AST.

I think the source code would be generated from a SyntaxTree instance,
so the task would be to generate a SyntaxTree for a translation unit,
manipulate the SyntaxTree nodes iteratively as desired, then output the
result by processing the tree.

The SyntaxTree is different to the clang AST in many ways, but it is
also a distinct representation, separate to the clang AST.

There seems to be two options for how SyntaxTree would be integrated
into the workflow:

* A matcher library similar to the current AST matchers would be written
for SyntaxTree. It would be used to find 'interesting' nodes.

* A developer would be required to use the current AST matchers to
locate an interesting node, then map that result to a SyntaxTree for
manipulation and output.

This seems to be an open question for Syntax tree as each option has
advantages and disadvantages.

Conclusion)

There seems to be 3 ways currently in some state of development to
process AST matcher results and mechanically refactor code.

1) Use Rewriter directly with SourceLocations, either in C++ or in
Javascript as I presented, or now also with python with

  https://github.com/firolino/python-clang-tooling

2) Use something like the new Transformer library with templated output

3) Use something like SyntaxTree to manipulate nodes directly and
generate the code from the result.

All will have some commonalities and different use-cases and 
motivations. The future is bright!

Thanks,

Stephen.