[cfe-dev] [RFC] Adding refactoring support to Clang

Thu Jun 15 16:57:14 PDT 2017

Hi,

This is a proposal for a new Clang-based refactoring engine (as indicated by Duncan last week [1]). The new refactoring engine should be based on both the existing code from llvm.org and the code that I've just committed to github.com/apple/swift-clang [2] (Please note that it has some outdated designs and APIs that I'd like to replace). This email is split into three different sections:
- The first section lists the refactoring actions supported by Xcode 9.
- The second section describes the actual proposal.
- The third section outlines an action plan that I would like to follow when submitting patches for the new refactoring engine.

I'm looking forward to any feedback that you might have!

# Background

At first I would like to provide a little bit of background on the C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based refactoring engine that supports several refactoring operations. The primary operation that it supports is “Rename”. The refactoring engine gathers the renamed symbol occurrences using two distinct mechanisms:

- Local rename. It gathers occurrences for declarations that are declared in a function's body. It is based on existing code from `clang-rename` that was moved over to Clang's `Tooling` library. It was then modified slightly to fit our goals. I also added support for some Objective-C declarations.

- Global rename. It gathers occurrences for all other declarations. We decided to put the high-level logic for global rename into Xcode's indexer because Xcode had to support cross-language rename (between Swift and Objective-C). The indexer uses the core engine in Clang’s lib/Index, so Clang is still very much involved in this process. The indexer also delegates low-level “indexed-file” renames to Clang. This lets us gather occurrences for symbolic references that don't have the full location information (like Objective-C selector names), textual matches for occurrences of a matching string in comments, and textual matches for a matching filename in the `#include` directives.

Xcode 9 also support some local refactoring actions:
- Extract function/method.
- Extract duplicate expression.
- Add missing switch cases (GIF: https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases). It is also available as a fixit.
- [C++] Fill-in method stubs from abstract base classes.
- Convert if to switch.
- [ObjC] Fill in protocol stubs. It is also available as a fixit.
- [ObjC] Wrap in NSLocalizedString

Another action that Xcode supports is called “Generate Missing Function Definitions” (GIF: https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions). This action actually depends on the indexer to determine the set of methods that have to be implemented and where should the generated code go. It’s also the one and only cross-TU refactoring action that is currently supported (without taking "rename" into account).

# Refactoring Engine Proposal

I would like to create a new refactoring engine that will go into Clang's `Tooling` library. The engine's API should be designed with different use cases in mind, and I would like to ensure that the engine accommodates the following three use cases:

- Libclang. It should provide a C-based refactoring API that we can use in Xcode.
- Clangd.
- A standalone refactoring tool, `clang-refactor`. This tool will be used for testing. It will also allow users to perform refactoring actions without using other tools.

The engine should try to follow the following high level goals:
- It should be easy for others to create, test, and integrate new local refactoring operations.
- Refactoring operations should be allowed to fail. The engine should have good error reporting mechanisms.
- Initiation via source selection should be quick (if we assume that we have access to a parsed AST) as we'd like to let a user know which actions are available as soon as possible.
- The engine should allow at least a basic level of configurability and customization (users should be able select options for a refactoring action if an action chooses to provide some). In the future we might also want to provide more powerful configuration mechanisms that will allow IDEs to generate custom UIs for different actions.

## Initiation

I think that there are two equally important modes of initiation that should be supported by Clang's refactoring engine:

- Initiation via source selection. This mode is important for IDEs.
- Initiation via AST queries. This mode is important for standalone tools / scripts that want to refactor an entire project.

Unfortunately our code doesn't provide a good enough abstraction that generalizes well for different actions. I'm still hashing out a better initiation scheme, and I will send a separate RFC with our proposal for initiation in the near future. Let me know if you have any ideas/suggestions!

## Refactoring

I would like to propose keeping the two modes that were previously described for the "Rename" operation. This will probably mean that a standalone tool like `clang-refactor` won't support the global rename mode of rename. On the other hand, if Clangd will gain some form of indexing infrastructure (which seems likely given the recent interest in indexing [3]), it will be able to use the engine's global renaming infrastructure as well. I'm hoping the community will provide some input on what should happen to `clang-rename`, as I'm not sure whether it should stay or be deprecated in favour of new tools.

Other refactoring operations will primarily work with source modifications. I think that the refactoring engine should provide a set of shared utility components that can be reused by different operations. For example, things like return type deduction for extracted declarations should be accessible to all operations. This kind of design should also make testing easier.

Certain refactoring actions (e.g. "Fill-in switch cases") work well together with diagnostic fix-its, as they can be presented in an editor and a user can initiate a refactoring operation using a UI that's already familiar to them. I would like to ensure that this kind of mechanism is accessible to refactoring operations that need it. Our current code contains two actions that are implemented in the `Edit` library. They're usable from `Sema`, and we take advantage of that to generate the fix-its during semantic analysis.

## Cross-TU Operations & Indexer Queries

The majority of refactoring actions that we've implemented perform source transformations in a single translation unit (TU). However, there are some actions that need to work across TU boundaries. Xcode 9 includes one such cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU solution that I've looked at initially was based on an API that had a series of callbacks between Xcode's indexer and libclang. However, I quickly discovered that this model didn't work that well, as it interfered with the code that loaded and handled different TUs in the indexer.

Note: while "Rename" is a global operation, it doesn't actually interact with other TUs because source replacements can be derived from the data stored in the project's index. Thus, our implementation doesn't treat it as a cross-TU operation.

The current implementation of the “Generate Missing Function Definitions" action is based on the "refactoring continuations" API. A refactoring continuation is just a suspended refactoring operation that can be executed in a TU that's different to the one in which the operation was created. Each continuation includes TU-specific state that's automatically managed by Clang, as well as a set of questions about that state. Each individual question is called an "indexer query".

Indexer queries are the core structures that enable cross-TU actions. An external component (like Xcode's indexer) has to evaluate them before running the continuation in order for the operation to work correctly. Each query can be described using four different attributes:

- Question: The kind of question that should be answered. Our implementation exposes the following two kinds at the libclang level: `Decl_FileThatShouldImplement` and `Decl_IsDefined`.
- Subject: The TU-specific state from the parent continuation that acts as the question's subject.
- Result: The value that has to be set by the external indexer component before running the refactoring continuation.
- Action: A flag that represents an additional action that has to be performed by the indexer before running the continuation. We use the `RunContinuationInTUThatHasThisFile` action in Xcode's indexer to determine which TU should be loaded and used when running the refactoring continuation.

Please note that while I mention "Xcode's indexer" quite a lot in this section, our refactoring continuation API is indexer-agnostic, and is designed to work with any indexer. The indexer just has to implement support for the indexer queries that are required for a particular refactoring operation, and the refactoring engine handles everything else. The code sample below shows how our code constructs the refactoring continuation for the "Generate Missing Function Definitions" action:

```
return continueInExternalASTUnit(
/*Indexer query that gets annotated with the RunContinuationInTUThatHasThisFile action */ fileThatShouldContainImplementationOf(Container),
/*Pointer to the continuation function*/ runInImplementationAST,
/*A state value*/ Container,
/*An additional indexer query over some state*/ filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) { return !D.isDefined(); }));
```

In practical terms, the following series of events occur in Xcode when it performs the “Generate Missing Function Definitions” operation:

- After the user chooses the "Generate Missing Function Definitions” action, Xcode notifies its indexer that it has to perform that action. The indexer then uses libclang to get the refactoring results after running the action's operation in the initiation TU.
- Xcode's indexer sees that the results provide a refactoring continuation instead of source replacements, so it starts looking at the attached indexer queries.
- Xcode's indexer evaluates each query by looking at the question's kind and the TU-specific state (A set of `CXCursor`s). It sends back the result to libclang using a set of query-agnostic routines that consume different data types.
- After evaluating the queries, Xcode's indexer loads the TU in which the refactoring continuation should run. It knows which TU it has to load by looking at the result of the query that included the `RunContinuationInTUThatHasThisFile` action flag.
- Xcode's indexer then invokes the refactoring continuation in the freshly loaded TU. Clang converts the continuation's state from previously serialized TU-independent state to a state that's specific to the new TU, and continues running the refactoring operation with this state.
- Xcode's indexer then receives the source replacements produced by the refactoring operation and passes them back to Xcode so that it can apply the replacements to the source.

I would like to propose an API model that's based on our current "refactoring continuations" API for the cross-TU operations in the new refactoring engine. Clang will provide a C++ API that can be used in a way that's similar to the code presented above. It will also manage all of the interactions with libclang and Clangd, so individual refactoring operations won't have to worry about the fine details of a particular indexer. Furthermore, the new refactoring engine will limit the set of available actions based on the set of queries that are supported by an indexer, so the indexer that chooses to support refactoring continuations won't have to support all of the queries.

## Testing & Tooling

The `clang-refactor` tool will be used to test the refactoring engine. I would like to propose the following action-agnostic command line interface:

```
clang-refactor <action> [options] <initiation file> [additional build options]
```

I think that the tool should use comment-based annotations in the test source files to control things like initiation. For example, given a selection annotation `// selection: +1:1 -> +2:12` on line 12, `clang-refactor` will map it to a selection range of 13:1 -> 14:12 and will initiate a certain refactoring action using that selection.

In addition to `clang-refactor`, I would also like to provide an additional stress-test tool that will work with entire projects (using compilation databases), and will be capable of:

- Initiating/performing refactoring actions at each token in all of the source files in a project.
- Verifying the consistency between the indexer and the refactoring engine to ensure that they have the same model of the source.

This tool will also come with a script that generates compilation databases for Clang's tests.

I also have a great little reviewing tool that helps others to visualize the refactoring changes for a particular action, and I'd be more than happy to share it if you're interested. The tool generates a set of HTML pages that contain the sources of the refactoring tests. The pages get annotated to show the changes that were made during refactoring. We found that looking at this visualization instead of the raw `CHECK` lines makes reviewers' life much easier.

# Action Plan

Initially I would like to adopt the following high-level plan for the creation of the new refactoring engine:

- Upstream any non-refactoring specific changes from our code.
- Move the core of `clang-rename` over to clang (`clang-rename` should still work as before).
- Start working on the `clang-refactor` tool so that it can use the previously moved code. Move the tests from `clang-rename` over to Clang and convert them to `clang-refactor` tests.
- Submit our changes to the core of the `clang-rename` engine.
- Create and test the new global rename component.
- Create an AST source selection component. Initiation via source selection will be based on this component.
- Create a refactoring initiation API that will be easy to use.
- Start submitting our local refactoring actions.
- Start working on the libclang API.
- Create and submit everything else that remains, e.g. Cross-TU Operations & Indexer Queries.

Cheers,
Alex

[1]: http://lists.llvm.org/pipermail/cfe-dev/2017-June/054108.html
[2]: https://github.com/apple/swift-clang/commit/9890adfbee8f854732d0093bc8b2a32be1be8844
[3]: http://lists.llvm.org/pipermail/cfe-dev/2017-May/053869.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170615/07b22452/attachment.html>