[PATCH] D72874: [clangd] Add a textual fallback for go-to-definition

Fri Mar 6 04:24:16 PST 2020

sammccall added a comment.

I'd like to sync up briefly on https://github.com/clangd/clangd/issues/241 so we know where we want to end up.

I think this is in good shape and certainly doesn't need a bigger scope, just want to be able to reason about how things will fit together.

================
Comment at: clang-tools-extra/clangd/SourceCode.h:93
+/// the entire comment or string token.
+SourceRange getWordAtPosition(const Position &Pos, const SourceManager &SM,
+                              const LangOptions &LangOpts);
----------------
nridge wrote:
> sammccall wrote:
> > sammccall wrote:
> > > consider moving the isLikelyToBeIdentifier check inside. The current API is pretty general and it's not clear yet what (else) it's good for so it's nice to direct towards intended usage.
> > > 
> > > Also doing the identifier check inside this function is more convenient when it relies on markers outside the identifier range (like doxygen `\p` or backtick-quoted identifiers)
> > > 
> > > That said, you may still want to return the range when it's not a likely identifier, with a signature like `StringRef getWordAtPosition(bool *LikelyIdentifier = nullptr)`. I'm thinking of the future case where the caller wants to find a nearby matching token and resolve it - resolving belongs in the caller so there's not much point having this function duplicate the check.
> > This doesn't use the SourceManager-structure of the file, so the natural signature would be `StringRef getWordAtPosition(StringRef Code, unsigned Offset)`.
> > 
> > (what are the practical cases where langopts is relevant?)
> Now that I'm using `wordTouching()` from D75479, I think this comment no longer applies?
I think the reasons still apply - D75479 doesn't need to check likelihood (it considers actual use as identifier evidence enough) so I didn't include it there, but we should eventually merge these more thoroughly I think. No need to do that until we actually want to implement different heuristics though.

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:240
+  Req.ProximityPaths = {MainFilePath};
+  // FIXME: Set Req.Scopes to the lexically enclosing scopes.
+  // For extra strictness, consider AnyScope=false.
----------------
nridge wrote:
> sammccall wrote:
> > FWIW the API for this is visibleNamespaces() from SourceCode.cpp.
> > (No enclosing classes, but I suspect we can live without them once we have a nearby-tokens solution too)
> Thanks, that's convenient!
> 
> Out of curiosity, though: is the reason to prefer this lexer-based approach over hit-testing the query location against `NamespaceDecl`s in the AST, mainly for performance?
Well, it was written for fallback code completion when we have no AST at all :-)

Gathering from the AST should be better, though it's not quite as simple as hit-testing (you also have to find `using namespace`). But this exists today, which is a feature!

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:258
+
+    // For now, only consider exact name matches, including case.
+    // This is to avoid too many false positives.
----------------
nridge wrote:
> sammccall wrote:
> > I wouldn't bother qualifying this as "for now". Any code is subject to change in the future, but requiring an exact name match for index-based results seems more like a design decision than a fixme.
> Do we want to rule out the possibility of handling typos in an identifier name in a comment (in cases where we have high confidence in the match, e.g. a long / unique name, small edit distance, only one potential match) in the future?
> 
> This is also relevant to whether we want to keep the `FuzzyMatcher` or not.
No idea whether typo-correction is a good idea in principle - tradeoff between current false negatives and false positives+compute.

However neither FuzzyMatcher nor the existing index implementations support/can easily support real typo correction, and it seems implausible to me we'd add it for this feature.

Compare to e.g:
 - allowing case-insensitive match in some cases: `fooBar` vs `FooBar` is a plausible "typo". This is easy to implement.
 - correct the typo where we see the fixed version used as an identifier in this file (and not the original). Excludes some cases, but drives false-positives way down, and easy to implement.

I don't think we need to rule things out, but I'm uncertain enough about the approach to think that putting comments, fuzzymatcher etc here speculatively isn't worth it.

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:489
+  Index->fuzzyFind(Req, [&](const Symbol &Sym) {
+    auto MaybeDeclLoc =
+        symbolLocationToLocation(Sym.CanonicalDeclaration, MainFilePath);
----------------
nridge wrote:
> Sorry this location-setting code is so messy. All my attempts to make it more concise have been thwarted by `llvm::Expected`'s very restrictive API.
Ugh, don't get me started on Error/Expected :-(

I'd love to get rid of it somehow, but it seems like we'd inevitably just end up with the new thing + Error/Expected + error_code/ErrorOr + return-a-bool, and I'm not sure it'd be better.

(If you have more energy than me, I'd enthusiastically +1 an llvm-dev proposal to drop the clever checks from llvm::Error, and I know some others who would...)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72874/new/

https://reviews.llvm.org/D72874