[PATCH] D72874: [clangd] Add a textual fallback for go-to-definition

Thu Mar 5 16:58:43 PST 2020

nridge marked an inline comment as done.
nridge added a comment.

In D72874#1900648 <https://reviews.llvm.org/D72874#1900648>, @sammccall wrote:

> This reminds me: it's not completely obvious what set of "act on symbol under the cursor" things this should (eventually) apply to.
>  I think not having e.g. find-references work makes sense - user should navigate to a "real" occurrence to resolve the ambiguity, and things like code actions are right out.
>  However having `textDocument/hover` work when we have high confidence in results would be really cool.
>  Obviously nothing in scope for this patch, but it seems worth writing this down somewhere, precisely because we shouldn't do it now.

Agreed. Filed https://github.com/clangd/clangd/issues/303 for hover.

In D72874#1901722 <https://reviews.llvm.org/D72874#1901722>, @sammccall wrote:

> (Tactically I think it makes sense to add the basic fallback logic, and follow up with the dependent-code entrypoints, but up to you)

Yep, will handle dependent code in a folow-up patch.

================
Comment at: clang-tools-extra/clangd/SourceCode.h:93
+/// the entire comment or string token.
+SourceRange getWordAtPosition(const Position &Pos, const SourceManager &SM,
+                              const LangOptions &LangOpts);
----------------
sammccall wrote:
> sammccall wrote:
> > consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and it's not clear yet what (else) it's good for so it's nice to direct towards intended usage.
> > 
> > Also doing the identifier check inside this function is more convenient when it relies on markers outside the identifier range (like doxygen `\p` or backtick-quoted identifiers)
> > 
> > That said, you may still want to return the range when it's not a likely identifier, with a signature like `StringRef getWordAtPosition(bool *LikelyIdentifier = nullptr)`. I'm thinking of the future case where the caller wants to find a nearby matching token and resolve it - resolving belongs in the caller so there's not much point having this function duplicate the check.
> This doesn't use the SourceManager-structure of the file, so the natural signature would be `StringRef getWordAtPosition(StringRef Code, unsigned Offset)`.
> 
> (what are the practical cases where langopts is relevant?)
Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies?

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:226
+
+std::vector<LocatedSymbol> navigationFallback(ParsedAST &AST,
+                                              const SymbolIndex *Index,
----------------
nridge wrote:
> sammccall wrote:
> > this function should have a high-level comment describing the strategy and the limitations (e.g. idea of extending it to resolve nearby matching tokens).
> > 
> > A name like `locateSymbolNamedTextuallyAt` would better describe what this does, rather than what its caller does.
> > 
> > I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through `locateSymbolAt`. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify.
> > I would strongly consider exposing this function publicly for the detailed tests, and only smoke-testing it through locateSymbolAt. Having to break the AST in tests or otherwise rely on the "primary" logic not working is brittle and hard to verify.
> 
> I was going to push back against this, but I ended up convincing myself that your suggestion is better :)
> 
> For the record, the consideration that convinced me was:
> 
> Suppose in the future we add fancier AST-based logic that handles a case like `T().foo()` (for example, by surveying types for which `T` is actually substituted, and offering `foo()` inside those types). If all we're testing is "navigation works for this case" rather than "navigation works for this case //via the AST-based mechanism//", we could regress the AST logic but have our test still pass because the testcase is simple enough that the text-based navigation fallback (that we're adding here) works as well.
Renamed and comment added. I still need to revise the tests.

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:240
+  Req.ProximityPaths = {MainFilePath};
+  // FIXME: Set Req.Scopes to the lexically enclosing scopes.
+  // For extra strictness, consider AnyScope=false.
----------------
sammccall wrote:
> FWIW the API for this is visibleNamespaces() from SourceCode.cpp.
> (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too)
Thanks, that's convenient!

Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance?

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:242
+  // For extra strictness, consider AnyScope=false.
+  Req.AnyScope = true;
+  // Choose a limit that's large enough that it contains the user's desired
----------------
It occured to me that I don't think we can do `AnyScope=false` if we want to handle dependent member cases like `T().uniqueMethodName()`. The members we want to find in such a case will often be both in a different file (so nearby-tokens won't handle them) and not in any visible scope.

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:250
+  auto MainFileURI = URIForFile::canonicalize(MainFilePath, MainFilePath);
+  bool HaveResultInMainFile = false;
+  Index->fuzzyFind(Req, [&](const Symbol &Sym) {
----------------
sammccall wrote:
> This is an interesting signal, I think there are two sensible ways to go about it:
>  - assume results in this file are more likely accurate than those in other files. In this case we should at minimum be using this in ranking, but really we should just drop all cross-file results if we have an in-file one.
>  - don't rely on index for main-file cases, and rely on "find nearby matching token and resolve it instead". That can easily handled cases defined/referenced in the main-file with sufficient accuracy, including non-indexed symbols. So here we can assume this signal is always false, and drop it.
> 
Since you've implemented "find nearby matching token and resolve it", I went with the second approach.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72874/new/

https://reviews.llvm.org/D72874