[PATCH] D88814: [clangd] Enable partial namespace matches for workspace symbols

Thu Oct 8 09:19:21 PDT 2020

kadircet added inline comments.

================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:44

+// Returns true if \p Query can be found as a sub-scope inside \p Scope.
+bool approximateScopeMatch(llvm::StringRef Scope,
----------------
sammccall wrote:
> kadircet wrote:
> > sammccall wrote:
> > > I had a little trouble following this...
> > > It seems a little simpler (fewer vars to track) if we avoid the up-front split on scopes.
> > > 
> > > ```
> > > assert(Scope.empty() || Scope.endswith("::")); // or handle in some way
> > > // Walk through Scope, consuming matching tokens from Query.
> > > StringRef First;
> > > while (!Scope.empty() && !Query.empty()) {
> > >   tie(First, Scope) = Scope.split("::");
> > >   if (Query.front() == First)
> > >     Query = Query.drop_front();
> > > }
> > > return Query.empty(); // all components of query matched
> > > ```
> > > 
> > > in fact we can avoid preprocessing query too:
> > > 
> > > ```
> > > // Query is just a StringRef
> > > assert(Scope.empty() || Scope.endswith("::")); // or handle in some way
> > > assert(Query.empty() || Query.endswith("::"));
> > > 
> > > // Walk through Scope, consuming matching tokens from Query.
> > > StringRef First;
> > > while (!Scope.empty() && !Query.empty()) {
> > >   tie(First, Scope) = Scope.split("::");
> > >   Query.consume_front(StringRef(First.data(), First.size() + 2) /*Including ::*/);
> > > }
> > > return Query.empty(); // all components of query matched
> > > ```
> > Yes but they would do different things. I believe the confusion is caused by usage of `sub-scope` without a clear definition.  The codes you've suggested are performing sub-sequence matches rather than sub-string(i.e. we are looking for a contigious segment in `Scope` that matches `Query`).
> > 
> > I believe a query of the form `a::c::` shouldn't be matched by `a::b::c::`. I can try simplifying the logic, but it would be nice to agree on the behaviour :D.
> > 
> > Sorry if I miscommunicated this so far.
> Ah right, I was indeed misreading the code. Let's have some definitions...
> 
> given query `a::b::Foo` with scope `a::b::`
> 
> | | a::b:: | `W::a::b::` | `a::X::b::` | `a::b::Y` |
> | exact | * | | | |
> | prefix |*| | | * |
> | suffix | *|* | | |
> | substring | * | * | | * |
> | subsequence | * | * | * | * |
> 
> These support correcting different types of "errors":
>  - exact: none
>  - prefix: may omit namespaces immediately before Foo
>  - suffix: query may be rooted anywhere (other than global ns)
>  - substring: query rooted anywhere, omit namespaces before Foo
>  - subsequence: may omit any component
> 
> We know "exact" is too strict.
> 
> I think "prefix" and by extension "substring" aren't particularly compelling rules as the "immediately before Foo" requirement is arbitrary.
> Why does `a::b::Foo` match `a::b::c::Foo` and not `a::c::b::Foo`? In each case we've omitted a namespace inside the query, the only difference is what it's sandwiched between.
> 
> Suffix makes intuitive sense, it accepts strings that make sense *somewhere* in the codebase.
> Subsequence makes intuitive sense too: you're allowed to forget uninteresting components, similar to how fuzzy-match lets you omit uninteresting words.
> 
> I'd prefer one of those and don't really mind which - I'd assumed subsequence was intended. Suffix is way easier to implement though :-)
Thanks for the nice table :D

I was mainly shying away from subsequence as I thought it would be too loose, but I suppose it is okay thinking about the fuzzyfind logic. So I've changed the implementation to be that way, while adding some tests.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88814/new/

https://reviews.llvm.org/D88814