[cfe-dev] [RFC] Adding refactoring support to Clang

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 20 02:55:37 PDT 2017


On Tue, Jun 20, 2017 at 11:26 AM Alex Lorenz <aleksei_lorenz at apple.com>
wrote:

>
> On 20 Jun 2017, at 09:27, Manuel Klimek <klimek at google.com> wrote:
>
> First, thanks for proposing this - this fits pretty much exactly with what
> we had envisioned / slowly started working towards, so big thumbs up :)
>
> Others have already made a lot of interesting observations / points,
> adding a couple of thoughts inline.
>
> On Fri, Jun 16, 2017 at 1:57 AM Alex Lorenz <aleksei_lorenz at apple.com>
> wrote:
>
>>
>> Hi,
>>
>> This is a proposal for a new Clang-based refactoring engine (as indicated
>> by Duncan last week [1]). The new refactoring engine should be based on
>> both the existing code from llvm.org and the code that I've just
>> committed to github.com/apple/swift-clang [2] (Please note that it has
>> some outdated designs and APIs that I'd like to replace). This email is
>> split into three different sections:
>> - The first section lists the refactoring actions supported by Xcode 9.
>> - The second section describes the actual proposal.
>> - The third section outlines an action plan that I would like to follow
>> when submitting patches for the new refactoring engine.
>>
>> I'm looking forward to any feedback that you might have!
>>
>> # Background
>>
>> At first I would like to provide a little bit of background on the
>> C++/Objective-C refactorings in Xcode. Xcode 9 contains a Clang-based
>> refactoring engine that supports several refactoring operations. The
>> primary operation that it supports is “Rename”. The refactoring engine
>> gathers the renamed symbol occurrences using two distinct mechanisms:
>>
>> - Local rename. It gathers occurrences for declarations that are
>> declared in a function's body. It is based on existing code from `
>> clang-rename` that was moved over to Clang's `Tooling` library. It was
>> then modified slightly to fit our goals. I also added support for some
>> Objective-C declarations.
>>
>> - Global rename. It gathers occurrences for all other declarations. We
>> decided to put the high-level logic for global rename into Xcode's indexer
>> because Xcode had to support cross-language rename (between Swift and
>> Objective-C). The indexer uses the core engine in Clang’s lib/Index, so
>> Clang is still very much involved in this process. The indexer also
>> delegates low-level “indexed-file” renames to Clang. This lets us gather
>> occurrences for symbolic references that don't have the full location
>> information (like Objective-C selector names), textual matches for
>> occurrences of a matching string in comments, and textual matches for a
>> matching filename in the `#include` directives.
>>
>> Xcode 9 also support some local refactoring actions:
>> - Extract function/method.
>> - Extract duplicate expression.
>> - Add missing switch cases (GIF:
>> https://reviews.llvm.org/file/data/q2qz53jd7ug4cpx4zlfx/PHID-FILE-aq75y26bgkfvm5e775jt/switchCases).
>> It is also available as a fixit.
>> - [C++] Fill-in method stubs from abstract base classes.
>> - Convert if to switch.
>> - [ObjC] Fill in protocol stubs. It is also available as a fixit.
>> - [ObjC] Wrap in NSLocalizedString
>>
>> Another action that Xcode supports is called “Generate Missing Function
>> Definitions” (GIF:
>> https://reviews.llvm.org/file/data/u4kzt62wjdin5dpl65jl/PHID-FILE-w7jl2vz42b5aybd6oizj/generateFunctionDefinitions).
>> This action actually depends on the indexer to determine the set of methods
>> that have to be implemented and where should the generated code go. It’s
>> also the one and only cross-TU refactoring action that is currently
>> supported (without taking "rename" into account).
>>
>> # Refactoring Engine Proposal
>>
>> I would like to create a new refactoring engine that will go into Clang's
>>  `Tooling` library. The engine's API should be designed with different
>> use cases in mind, and I would like to ensure that the engine accommodates
>> the following three use cases:
>>
>> - Libclang. It should provide a C-based refactoring API that we can use
>> in Xcode.
>> - Clangd.
>> - A standalone refactoring tool, `clang-refactor`. This tool will be
>> used for testing. It will also allow users to perform refactoring actions
>> without using other tools.
>>
>> The engine should try to follow the following high level goals:
>> - It should be easy for others to create, test, and integrate new local
>> refactoring operations.
>> - Refactoring operations should be allowed to fail. The engine should
>> have good error reporting mechanisms.
>> - Initiation via source selection should be quick (if we assume that we
>> have access to a parsed AST) as we'd like to let a user know which actions
>> are available as soon as possible.
>> - The engine should allow at least a basic level of configurability and
>> customization (users should be able select options for a refactoring action
>> if an action chooses to provide some). In the future we might also want to
>> provide more powerful configuration mechanisms that will allow IDEs to
>> generate custom UIs for different actions.
>>
>> ## Initiation
>>
>> I think that there are two equally important modes of initiation that
>> should be supported by Clang's refactoring engine:
>>
>> - Initiation via source selection. This mode is important for IDEs.
>> - Initiation via AST queries. This mode is important for standalone
>> tools / scripts that want to refactor an entire project.
>>
>> Unfortunately our code doesn't provide a good enough abstraction that
>> generalizes well for different actions. I'm still hashing out a better
>> initiation scheme, and I will send a separate RFC with our proposal for
>> initiation in the near future. Let me know if you have any
>> ideas/suggestions!
>>
>> ## Refactoring
>>
>> I would like to propose keeping the two modes that were previously
>> described for the "Rename" operation. This will probably mean that a
>> standalone tool like `clang-refactor` won't support the global rename
>> mode of rename. On the other hand, if Clangd will gain some form of
>> indexing infrastructure (which seems likely given the recent interest in
>> indexing [3]), it will be able to use the engine's global renaming
>> infrastructure as well. I'm hoping the community will provide some input on
>> what should happen to `clang-rename`, as I'm not sure whether it should
>> stay or be deprecated in favour of new tools.
>>
>> Other refactoring operations will primarily work with source
>> modifications. I think that the refactoring engine should provide a set of
>> shared utility components that can be reused by different operations. For
>> example, things like return type deduction for extracted declarations
>> should be accessible to all operations. This kind of design should also
>> make testing easier.
>>
>> Certain refactoring actions (e.g. "Fill-in switch cases") work well
>> together with diagnostic fix-its, as they can be presented in an editor and
>> a user can initiate a refactoring operation using a UI that's already
>> familiar to them. I would like to ensure that this kind of mechanism is
>> accessible to refactoring operations that need it. Our current code
>> contains two actions that are implemented in the `Edit` library. They're
>> usable from `Sema`, and we take advantage of that to generate the
>> fix-its during semantic analysis.
>>
>> ## Cross-TU Operations & Indexer Queries
>>
>> The majority of refactoring actions that we've implemented perform source
>> transformations in a single translation unit (TU). However, there are some
>> actions that need to work across TU boundaries. Xcode 9 includes one such
>> cross-TU action - “Generate Missing Function Definitions”. A naïve cross-TU
>> solution that I've looked at initially was based on an API that had a
>> series of callbacks between Xcode's indexer and libclang. However, I
>> quickly discovered that this model didn't work that well, as it interfered
>> with the code that loaded and handled different TUs in the indexer.
>>
>> Note: while "Rename" is a global operation, it doesn't actually interact
>> with other TUs because source replacements can be derived from the data
>> stored in the project's index. Thus, our implementation doesn't treat it as
>> a cross-TU operation.
>>
>> The current implementation of the “Generate Missing Function Definitions"
>> action is based on the "refactoring continuations" API. A refactoring
>> continuation is just a suspended refactoring operation that can be executed
>> in a TU that's different to the one in which the operation was created.
>> Each continuation includes TU-specific state that's automatically managed
>> by Clang, as well as a set of questions about that state. Each individual
>> question is called an "indexer query".
>>
>> Indexer queries are the core structures that enable cross-TU actions. An
>> external component (like Xcode's indexer) has to evaluate them before
>> running the continuation in order for the operation to work correctly. Each
>> query can be described using four different attributes:
>>
>> - Question: The kind of question that should be answered. Our
>> implementation exposes the following two kinds at the libclang level: `
>> Decl_FileThatShouldImplement` and `Decl_IsDefined`.
>> - Subject: The TU-specific state from the parent continuation that acts
>> as the question's subject.
>> - Result: The value that has to be set by the external indexer component
>> before running the refactoring continuation.
>> - Action: A flag that represents an additional action that has to be
>> performed by the indexer before running the continuation. We use the `
>> RunContinuationInTUThatHasThisFile` action in Xcode's indexer to
>> determine which TU should be loaded and used when running the refactoring
>> continuation.
>>
>> Please note that while I mention "Xcode's indexer" quite a lot in this
>> section, our refactoring continuation API is indexer-agnostic, and is
>> designed to work with any indexer. The indexer just has to implement
>> support for the indexer queries that are required for a particular
>> refactoring operation, and the refactoring engine handles everything else.
>> The code sample below shows how our code constructs the refactoring
>> continuation for the "Generate Missing Function Definitions" action:
>>
>> ```
>> return continueInExternalASTUnit(
>> /*Indexer query that gets annotated with the
>> RunContinuationInTUThatHasThisFile action */
>> fileThatShouldContainImplementationOf(Container),
>> /*Pointer to the continuation function*/ runInImplementationAST,
>> /*A state value*/ Container,
>> /*An additional indexer query over some state*/
>> filter(llvm::makeArrayRef(SelectedMethods), [](const DeclEntity &D) {
>> return !D.isDefined(); }));
>> ```
>>
>
> I think the harder design space will be how to run something on "all files
> containing references to <X>". What we definitely need is an interface that
> allows the ops to be sharded out across a fleet of machines (we use C++ ;)
>  based on the expected amount of work. This has some restrictions on the
> design, specifically also is why we use SourceManager independent classes
> to handle replacements etc.
>
>
> I see. I haven’t thought that much about sharding across machines.
> However, I do think that we should aim to design structures (for this part
> of the API in particular) in a manner that can be easily serialized and
> passed between different processes and machines.
>

Yep, exactly. This is basically a prediction that folks from our side will
put some eye on these things during code review, as we have some experience
what does (and does not ;) work here.

Looking forward to the patches :D


> Generally, I fully agree that we can get an interface here that is indexer
> independent.
>
>
>>
>> In practical terms, the following series of events occur in Xcode when it
>> performs the “Generate Missing Function Definitions” operation:
>>
>> - After the user chooses the "Generate Missing Function Definitions”
>> action, Xcode notifies its indexer that it has to perform that action. The
>> indexer then uses libclang to get the refactoring results after running the
>> action's operation in the initiation TU.
>> - Xcode's indexer sees that the results provide a refactoring
>> continuation instead of source replacements, so it starts looking at the
>> attached indexer queries.
>> - Xcode's indexer evaluates each query by looking at the question's kind
>> and the TU-specific state (A set of `CXCursor`s). It sends back the
>> result to libclang using a set of query-agnostic routines that consume
>> different data types.
>> - After evaluating the queries, Xcode's indexer loads the TU in which
>> the refactoring continuation should run. It knows which TU it has to load
>> by looking at the result of the query that included the `
>> RunContinuationInTUThatHasThisFile` action flag.
>> - Xcode's indexer then invokes the refactoring continuation in the
>> freshly loaded TU. Clang converts the continuation's state from previously
>> serialized TU-independent state to a state that's specific to the new TU,
>> and continues running the refactoring operation with this state.
>> - Xcode's indexer then receives the source replacements produced by the
>> refactoring operation and passes them back to Xcode so that it can apply
>> the replacements to the source.
>>
>> I would like to propose an API model that's based on our current
>> "refactoring continuations" API for the cross-TU operations in the new
>> refactoring engine. Clang will provide a C++ API that can be used in a way
>> that's similar to the code presented above. It will also manage all of the
>> interactions with libclang and Clangd, so individual refactoring operations
>> won't have to worry about the fine details of a particular indexer.
>> Furthermore, the new refactoring engine will limit the set of available
>> actions based on the set of queries that are supported by an indexer, so
>> the indexer that chooses to support refactoring continuations won't have to
>> support all of the queries.
>>
>> ## Testing & Tooling
>>
>> The `clang-refactor` tool will be used to test the refactoring engine. I
>> would like to propose the following action-agnostic command line interface:
>>
>> ```
>> clang-refactor <action> [options] <initiation file> [additional build
>> options]
>> ```
>>
>> I think that the tool should use comment-based annotations in the test
>> source files to control things like initiation. For example, given a
>> selection annotation `// selection: +1:1 -> +2:12` on line 12, `
>> clang-refactor` will map it to a selection range of 13:1 -> 14:12 and
>> will initiate a certain refactoring action using that selection.
>>
>
> Ok, already bikeshedding: can we call it a range instead of a selection?
>
>
> Sure. The keyword is not what's important though ;)
>
>
>
>> In addition to `clang-refactor`, I would also like to provide an
>> additional stress-test tool that will work with entire projects (using
>> compilation databases), and will be capable of:
>>
>> - Initiating/performing refactoring actions at each token in all of the
>> source files in a project.
>> - Verifying the consistency between the indexer and the refactoring
>> engine to ensure that they have the same model of the source.
>>
>> This tool will also come with a script that generates compilation
>> databases for Clang's tests.
>>
>> I also have a great little reviewing tool that helps others to visualize
>> the refactoring changes for a particular action, and I'd be more than happy
>> to share it if you're interested. The tool generates a set of HTML pages
>> that contain the sources of the refactoring tests. The pages get annotated
>> to show the changes that were made during refactoring. We found that
>> looking at this visualization instead of the raw `CHECK` lines makes
>> reviewers' life much easier.
>>
>
> Nice! Additionally, I'd also suggest to use unit tests, especially for
> small dedicated regression tests where the AST is complex (might be more of
> a problem for C++).
>
>
> I generally agree, I think that our code relied on end-to-end regression
> tests too much. Some self-contained components will be tested with unit
> tests instead in the new engine.
>
>
>
>>
>> # Action Plan
>>
>> Initially I would like to adopt the following high-level plan for the
>> creation of the new refactoring engine:
>>
>> - Upstream any non-refactoring specific changes from our code.
>> - Move the core of `clang-rename` over to clang (`clang-rename` should
>> still work as before).
>> - Start working on the `clang-refactor` tool so that it can use the
>> previously moved code. Move the tests from `clang-rename` over to Clang
>> and convert them to `clang-refactor` tests.
>> - Submit our changes to the core of the `clang-rename` engine.
>> - Create and test the new global rename component.
>> - Create an AST source selection component. Initiation via source
>> selection will be based on this component.
>> - Create a refactoring initiation API that will be easy to use.
>> - Start submitting our local refactoring actions.
>> - Start working on the libclang API.
>> - Create and submit everything else that remains, e.g. Cross-TU
>> Operations & Indexer Queries.
>>
>
> Sounds generally good. Thanks!
>
>
>>
>> Cheers,
>> Alex
>>
>> [1]: http://lists.llvm.org/pipermail/cfe-dev/2017-June/054108.html
>> [2]:
>> https://github.com/apple/swift-clang/commit/9890adfbee8f854732d0093bc8b2a32be1be8844
>> [3]: http://lists.llvm.org/pipermail/cfe-dev/2017-May/053869.html
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170620/7aa7d656/attachment.html>


More information about the cfe-dev mailing list