[PATCH] D56611: [clangd] A code action to swap branches of an if statement

Tue Jan 29 09:35:03 PST 2019

sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.

================
Comment at: clang-tools-extra/clangd/SourceCode.h:63

+/// Turns a token range into a half-open range and checks its correctness.
+/// The resulting range will have only valid source location on both sides, both
----------------
I think the semantics of the locations (expansion vs spelling) need to be spelled out here, at least in the comment.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/SwapIfBranches.cpp:60
+        Ctx.getSourceManager(), Ctx.getLangOpts(),
+        SourceRange(If->getIfLoc(), If->getCond() ? If->getCond()->getEndLoc()
+                                                  : If->getIfLoc()));
----------------
what's the case where the condition is null?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/SwapIfBranches.cpp:63
+    if (R && (halfOpenRangeContains(Ctx.getSourceManager(), *R, CursorLoc) ||
+              R->getEnd() == CursorLoc)) {
+      Result = If;
----------------
the explicit checks for equality at the endpoint seem... odd
do you need a halfOpenRangeTouches() or so?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/SwapIfBranches.cpp:101
+
+  auto ThenRng = toHalfOpenFileRange(SrcMgr, Ctx.getLangOpts(),
+                                     If->getThen()->getSourceRange());
----------------
do this in apply()?

This is an example refactoring, if we have to add a bunch of checks because it fires too often then we should try to solve that systematically.

================
Comment at: clang-tools-extra/unittests/clangd/TweakTests.cpp:46
+
+class TweakTest : public ::testing::Test {
+public:
----------------
fixture has no state, can these be free functions?

================
Comment at: clang-tools-extra/unittests/clangd/TweakTests.cpp:116
+TEST_F(TweakTest, SwapIfBranches) {
+  llvm::StringLiteral ID = "SwapIfBranches";
+
----------------
why StringLiteral here?

================
Comment at: clang-tools-extra/unittests/clangd/TweakTests.cpp:140
+  )cpp";
+  EXPECT_THAT_EXPECTED(apply(ID, Input), HasValue(Output));
+}
----------------
`checkTransform(ID, Input, Output)`, to avoid mixing styles?

================
Comment at: clangd/refactor/tweaks/SwapIfBranches.cpp:50
+
+class FindIfUnderCursor : public RecursiveASTVisitor<FindIfUnderCursor> {
+public:
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > (Mostly this is food for thought for later - we shouldn't try to solve everything in this patch)
> > 
> > Two efficiency problems here:
> >  - doing one traversal per tweak is wasteful (when we have >1 tweaks, but worth at least *thinking* about)
> >  - traversing the entire AST rather than just the nodes over the cursor is bad news (though "over the cursor" may not be trivial to define)
> > 
> > Idea for how to "framework" this problem away:
> > Add `vector<DynTypedNode> SelectedNodes` to the inputs, the idea being that this is the stack from the narrowest `Expr` under the cursor to the `TranslationUnitDecl` at the top.
> > This can be produced by a RecursiveASTVisitor (that either traverses the whole AST, or just the bits under the cursor if that's simple enough).
> > Tweaks would iterate over the nodes from narrowest to broadest, deciding whether to select this node for processing, continue iterating, or bail out.
> > 
> > Matching in checks are then pretty easy to write, we haven't removed too much flexibility in flow control, and it's pretty hard to write a slow check.
> > 
> > There are some complications:
> >  - we need access to parents somehow (e.g. consider the case of adding NS qualifiers, we want to match on names but then traverse up to the containing declrefexpr to get the nested namespace loc)
> >  - the interesting nodes may be a tree rather than a stack, because nodes overlap in the source. We could store a postorder traversal... but this makes e.g. "bail out when you hit a functiondecl" harder - you probably want to bail out of the current branch only.
> >  - things get complicated when we think about macros - depending on the semantics we want, it may be hard for the framework to prune parts of the tree that aren't under the cursor withouth traversing the whole AST.
> I mostly agree, however I still wonder whether that would be inefficient in some cases. E.g. consider a corner if the input selection from LSP contains the whole file? I see two options there: (1) putting all nodes of a file into `vector<DynTypedNode>`, (2) putting only the top-level `TranslationUnitDecl` there.
> 
> It seems easy to end up with (1) to make the interface useful, however we probably prefer (2) because otherwise we're back into the worst-case scenario, i.e. "every  tweak traverses all the nodes"
> 
> > we need access to parents somehow (e.g. consider the case of adding NS qualifiers, we want to match on names but then traverse up to the containing declrefexpr to get the nested namespace loc)
> 
> I wonder if it's possible to instead write the checks so that they only look at the children of the nodes in a vector? My bet is that almost all of the checks only need to look one or two levels down, so this shouldn't turn into inefficiency.
> 
> 
> I initially thought about providing a very limited set of AST nodes as inputs for the actions that are computed efficiently. Specifically, I thought about having:
> 
> ```
> struct Selection {
>   Decl *DeclUnderCursor; // the innermost declaration under CursorLoc.
>   Stmt *StmtUnderCursor; // the innermost statement under CursorLoc.
>   Expr *ExprUnderCursor; // the innermost expression under CursorLoc.
> 
>   // For some tweaks (extract var, extract method) we will also need these:
>   Expr *SelectedExpression; // expression exactly matching selection.
>   vector<Stmt*> SelectedStatements; // a list of statements exactly matching 
> selection.
> };
> ```
> 
> The checks would only look at those few statements, e.g. "Swap If" would try `dyn_cast_or_null<Stmt*>(S.StmtUnderCursor)` and do the same checks for cursor locations.
> 
> We could expand the set of inputs as needed. This would mean that we don't have to design the framework, but that also mean writing some checks would involve some work in the shared code that computes these inputs.
> 
> I bet that complexity-wise we would end up with simpler checks and more complicated shared code to compute the inputs. OTOH, we won't need to struggle with the limitations of the AST to design the framework that would properly handle all interesting cases without doing extra work.
> 
> It seems easy to end up with (1) to make the interface useful, however we probably prefer (2) because otherwise we're back into the worst-case scenario, i.e. "every tweak traverses all the nodes"

I think the principled version of 2) is something like: for every node, find its own range, and its expanded range until the next non-comment token. e.g.:
```                 #######################                
foo(); /* bar */ if (x) { y } else { z } /*baz*/ abort()
      ###########################################       ```

The directly interesting nodes are those whose range intersects with the selection, and whose expanded range contains the selection, and none of whose children contains the selection.

Or something like that. I think you'd be able to invert the fuzziness here (do range-expansion on the selection endpoints instead of the nodes) to make the checks cheap.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56611/new/

https://reviews.llvm.org/D56611