[PATCH] D76432: [clangd] Add a tweak for adding "using" statement.

Thu Mar 19 14:15:11 PDT 2020

sammccall added a comment.

This seems really well-thought-out.
I'm being (even) more verbose than usual about the interesting AST details. Please do push back/defer fixing anything that adds a lot of complexity and doesn't seem important.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:20
+
+// Tweak for removing full namespace qualifier under cursor on function calls
+// and types and adding "using" statement instead.
----------------
function calls is obsolete I think, I'd just call these references or DeclRefExpr

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:22
+// and types and adding "using" statement instead.
+class AddUsing : public Tweak {
+public:
----------------
You could put some general comments about the scope/choices here (applicable only to namespaces, insertion strategy)

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:22
+// and types and adding "using" statement instead.
+class AddUsing : public Tweak {
+public:
----------------
sammccall wrote:
> You could put some general comments about the scope/choices here (applicable only to namespaces, insertion strategy)
> Removing qualifier from other instances of the same type/call.

I think this is valuable enough to be worth calling out in the top-level comment as not-yet-done.
The other extensions you mention seem like we might never get around to them and it'd be fine.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:34
+    // True iff "using" already exists and we should not add it.
+    bool IdenticalUsingFound = false;
+    // Location to insert the "using" statement.
----------------
is this redundant with Loc.isValid()?

If so, I'd either just use Loc (with a comment), or use optional<SourceLocation> with a comment, to emphasize the relationship.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:37
+    SourceLocation Loc;
+    // Extra suffix to place after the "using" statement. Depending on what the
+    // insertion point is anchored to, we may need one or more \n to ensure
----------------
can we get away with always inserting \n and relying on clang-format to clean it up?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:44
+  // SourceLocation if the "using" statement already exists.
+  llvm::Expected<InsertionPointData>
+  findInsertionPoint(const Selection &Inputs);
----------------
this doesn't use any state - make it static or a free function?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:48
+  // The qualifier to remove. Set by prepare().
+  NestedNameSpecifierLoc NNSL;
+  // The name following NNSL. Set by prepare().
----------------
nit: I quite like the abbreviated names for locals but this deserves a real name like Qualifier or QualifierToRemove.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:54
+
+std::string AddUsing::title() const {
+  return std::string(
----------------
Can we make this dynamic? The more clearly we communicate what is targeted, the more confidence the user will have to use it.

`title()` may only be called after a successful prepare(), so we can use the prepared state.

Maybe something like `Add using-declaration for Bar, and remove qualifier`?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:56
+  return std::string(
+      llvm::formatv("Add using statement and remove full qualifier."));
+}
----------------
nit: not actually a statement :-(

If "using-declaration" is overly laywerly (and easily confused with using-directive), then maybe we should just spell it out: `Add "using foo::Bar" ...`

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:68
+    auto Loc = D->getUsingLoc();
+    if (Loc.isInvalid() || Loc.isMacroID() ||
+        SM.getFileID(Loc) != SM.getMainFileID()) {
----------------
no need for the first two conditions, getFileID will give you an invalid fileid and a macro fileid respectively

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:71
+      return true;
+    }
+    if (D->getDeclContext()->Encloses(SelectionDeclContext)) {
----------------
nit: we conventionally leave out the {} on one-line if bodies etc.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:72
+    }
+    if (D->getDeclContext()->Encloses(SelectionDeclContext)) {
+      Results.push_back(D);
----------------
This check is sufficient for correctness, but we're still going to traverse all the parts of the AST that can't pass this check.

I think you can override TraverseDecl. If the Decl is a DeclContext and it doesn't enclose the selectiondc, then return (pruning that subtree). Otherwise call RecursiveASTVisitor::TraverseDecl and continue as before.

(I think you still want this check though - pruning decls is enough for performance but may not be for correctness)

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:89
+
+  // If we're looking at a type or NestedNameSpecifier, walk up the tree until
+  // we find the "main" node we care about, which would be ElaboratedTypeLoc or
----------------
I like the idea here, but I'm not sure it quite works. e.g. any typeloc has a directly enclosing typeloc, the inner one can't be targeted. So what about `x::S*`? (pointertypeloc > elaboratedtypeloc > recordtypeloc)?

- looping up from NNS until you get to something else makes sense
- unwrapping typeloc -> elaboratedtypeloc makes sense, but I don't think you ever want to unwrap multiple levels?
- i don't think these cases overlap at all, you want one or the other

Am I missing something?

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:123
+  // namespace prefix and remove that.
+  if (!NNSL.hasQualifier() ||
+      !NNSL.getNestedNameSpecifier()->getAsNamespace()) {
----------------
I think you also want to check that name is non-null, which would be the case for non-identifier names like `foo::operator+(...)`.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:132
+llvm::Expected<AddUsing::InsertionPointData>
+AddUsing::findInsertionPoint(const Selection &Inputs) {
+  auto &SM = Inputs.AST->getSourceManager();
----------------
One possible hiccup is insertion splitting a comment from its target. Cases I can think of:
- `namespace { // anonymous` -> `namespace { using foo::bar; // anonymous`. This seems rare/unimportant.
- `/* docs */ int firstTopLevelDecl` -> `/* docs */ using foo::bar; int firstTopLevelDecl`. This seems common/bad.
- `/* for widgets */using foo::Qux;` -> `/* for widgets */ using foo::bar; using foo::Qux`. This seems rare/unimportant.

I think you could handle the decl case by calling `ASTContext::getLocalCommentForDeclUncached()` and using the getBeginLoc() of the returned comment if any.

That said, this is probably something that we should be doing throughout clangd and we should have some common function in AST.h or so to take care of it. It seems fine to leave a comment and we can think about this later. (Or file a bug where we can dump a list of places that should be accounting for the comment range)

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:144
+
+  llvm::sort(Usings, [&SM](const UsingDecl *L, const UsingDecl *R) {
+    return SM.isBeforeInTranslationUnit(L->getUsingLoc(), R->getUsingLoc());
----------------
This is basically always going to be already sorted. AST traversal isn't strictly left-to-right but pretty close (I think the major exceptions are weird objc things). So you could leave this out.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:148
+  for (auto &U : Usings) {
+    if (SM.isBeforeInTranslationUnit(Inputs.Cursor, U->getUsingLoc()))
+      break;
----------------
ah, neat :-)
but you could check this in the recursive AST visitor instead, and use it to prune the tree.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:150
+      break;
+    if (U->getQualifier()->getAsNamespace() ==
+            NNSL.getNestedNameSpecifier()->getAsNamespace() &&
----------------
Hmm, I *think* it's probably true that NNS->getAsNamespace() always refers to the canonical (first) NamespaceDecl. If not, you can getCanonicalDecl() on both to ensure this.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:161
+    // We could be smarter and check if the deepest block of "using" is sorted
+    // alphabetically and if so, insert at appropriate place. For now, users can
+    // just reformat the file with clang-format or similar.
----------------
the tweak infrastructure should do this in any case, it runs clang-format around changed areas (assuming using-decl sorting is on, which it is in the default styles).

do you not see this behaviour? (unit-tests won't exhibit it as the clang-format step is deliberately excluded from them)

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:206
+  tooling::Replacements R;
+  if (auto Err = R.add(tooling::Replacement(
+          SM, CharSourceRange::getTokenRange(NNSL.getSourceRange()), "",
----------------
(Sorry if some of this is obvious: TL;DR: we should use TokenBuffer to handle macros in this case)

Whenever we try to use SourceLocations to reason about written source code, we have to think about macros. Some libraries try to encapsulate this, but the road to confusion is paved with good intentions, and it's often best to decide on the policy and be explicit about it.

For example, when provided a macro location, the tooling::Replacement constructor uses its spelling location. 
Given: 
```
// imagine we're trying to abstract away namespaces for C or something
#define NS(X) foo::X
NS(Bar) boo;
```
Running this action on `N` would result in changing the definition of the `NS` macro to delete the "foo::", as that's where the qualifier was spelled!

It would be reasonable (but actually not trivial!) to try to detect any macro usage and bail out. The general case of "is there some sequence of source-code tokens that correspond to this range of preprocessed tokens and nothing else" is pretty hard.

However TokenBuffer does have APIs for this exact question, as it was designed for writing refactorings that replace nodes. I think you want:
 - `expandedTokens(NNSL.getSourceRange())`
 - `spelledForExpanded(ExpandedTokens)`
 - `Token::range(SM, Spelled.front(), Spelled.back())`
 - `Replacement("fname", Range.Offset, Range.Length, "")`

(ugh, that's really awkward. Maybe we should have a helper in TokenBuffer that combines 1-3)

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:207
+  if (auto Err = R.add(tooling::Replacement(
+          SM, CharSourceRange::getTokenRange(NNSL.getSourceRange()), "",
+          Inputs.AST->getLangOpts()))) {
----------------
BTW, this is an instance of a common pattern where we have the location of the start of a token (in this case, the last token in the range) and want the token bounds, and end up running the lexer in raw mode to get it! The giveaway is LangOpts which it needs it order to lex.

This is probably not terrible here but it's fiddly and inefficient (when done everywhere) and gets things wrong sometimes. TokenBuffer stores the expanded token stream, the spelled tokens from the main file, and the correspondence between the two, and should generally be used where possible. It's stored in the ParsedAST.

================
Comment at: clang-tools-extra/clangd/refactor/tweaks/AddUsing.cpp:226
+
+    if (auto Err = R.add(tooling::Replacement(SM, InsertionPoint->Loc, 0,
+                                              UsingTextStream.str()))) {
----------------
Again macros.
This seems a bit more pressing, there's a nontrivial chance that you're inserting inside a namespace which is introduced by invoking a `FOO_NS_BEGIN` macro.

Unfortunately I don't see a really simple way to insert after this macro - using the expansion location fails if macro doesn't end after the `}`. (e.g. `ANON_NAMESPACE(f::X y;)`)
(Also you'd need to take care of inserting a space after the macro invocation to avoid `FOO_NS_BEGINusing`.)

I'd suggest simply skipping over any candidate locations (namespaces or using declarations) that aren't in the main file, using the expansion location of the first-top-level-decl fallback, and asserting here that the location is in the main file. (We use raw `assert()` in LLVM)

================
Comment at: clang-tools-extra/clangd/unittests/TweakTests.cpp:2436
+#include "test.hpp"
+namespace {using one::two::ff;
+
----------------
alternately: `using ::one::two::ff`.
Correct more often, uglier/longer, probably a style preference but not worth having an option for. Pick one :-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76432/new/

https://reviews.llvm.org/D76432