[PATCH] D44882: [clangd] Implementation of workspace/symbol request

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 29 10:42:04 PDT 2018


sammccall added a comment.

This is fantastic, really looking forward to using it.
I'm going to need to give up on vim at this rate, or do some serious work on ycm.

Most significant comments are around the new functions in SourceLocation, and whether we can keep the type-mapping restricted to the LSP layer.



================
Comment at: clangd/ClangdLSPServer.cpp:99
       Params.capabilities.textDocument.completion.completionItem.snippetSupport;
+  if (Params.capabilities.workspace && Params.capabilities.workspace->symbol &&
+      Params.capabilities.workspace->symbol->symbolKind)
----------------
malaperle wrote:
> sammccall wrote:
> > This is the analogue to the one on `completion` that we currently ignore, and one on `symbol` corresponding to the `documentSymbol` call which isn't implemented yet.
> > 
> > I think the value of the extended types is low and it might be worth leaving out of the first version to keep complexity down.
> > (If we really want it, the supporting code (mapping tables) should probably be in a place where it can be reused by `documentSymbol` and can live next to the equivalent for `completion`...)
> I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it. So would it be OK to move to FindSymbols.cpp (the new common file for document/symbol and workspace/symbol)?
> I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it.
OK, fair enough.

I think what bothers me about this option is how deep it goes - this isn't really a request option in any useful sense.
I'd suggest the "C++ layer" i.e. ClangdServer should always return full-fidelity results, and the "LSP layer" should deal with folding them.

In which case adjustKindToCapability should live somewhere like protocol.h and get called from this file - WDYT?


================
Comment at: clangd/ClangdServer.h:161
 
+  void workspaceSymbols(StringRef Query,
+                        const clangd::WorkspaceSymbolOptions &Opts,
----------------
nit: add a comment for this API, e.g. `/// Retrieve the top symbols from the workspace matching a query`


================
Comment at: clangd/FindSymbols.cpp:28
+  // All clients should support those.
+  if (Kind >= SymbolKind::File && Kind <= SymbolKind::Array)
+    return Kind;
----------------
Can we rather have this condition checked when the client sets the supported symbols, and remove this special knowledge here?


================
Comment at: clangd/FindSymbols.cpp:32
+  if (supportedSymbolKinds &&
+      std::find(supportedSymbolKinds->begin(), supportedSymbolKinds->end(),
+                Kind) != supportedSymbolKinds->end())
----------------
this seems gratuituously inefficient - I guess it doesn't matter that much, but can't we store the supported kinds as a std::bitset or so?


================
Comment at: clangd/FindSymbols.cpp:36
+
+  // Provide some fall backs for common kinds that are close enough.
+  if (Kind == SymbolKind::Struct)
----------------
nit: if we think this will be extended, a switch might be clearer


================
Comment at: clangd/FindSymbols.cpp:42
+
+  if (!supportedSymbolKinds) {
+    // Provide some sensible default when all fails.
----------------
This code shouldn't need to handle this case. The LSP specifies the default set of supported types if it's not provided, so ClangdLSPServer should enure we always have a valid set


================
Comment at: clangd/FindSymbols.cpp:114
+  std::vector<SymbolInformation> Result;
+  if (Query.empty() || !Index)
+    return Result;
----------------
why Query.empty()?
In practice maybe showing results before you type is bad UX, but that seems up to the editor to decide.


================
Comment at: clangd/FindSymbols.cpp:117
+
+  auto Names = splitQualifiedName(Query);
+
----------------
nit: move this below the next "paragraph" where it's used?


================
Comment at: clangd/FindSymbols.cpp:130
+
+  // Global scope is represented by "" in FuzzyFind.
+  if (Names.first.startswith("::"))
----------------
nit: this is `Names.first.consume_front("::")`
Might be clearer as "FuzzyFind doesn't want leading :: qualifier" to avoid implication, that global namespace is special, particularly because...


================
Comment at: clangd/FindSymbols.cpp:130
+
+  // Global scope is represented by "" in FuzzyFind.
+  if (Names.first.startswith("::"))
----------------
sammccall wrote:
> nit: this is `Names.first.consume_front("::")`
> Might be clearer as "FuzzyFind doesn't want leading :: qualifier" to avoid implication, that global namespace is special, particularly because...
actually the global namespace *is* special :-)

IMO If you type `Foo`, you should get results from any namespace, but if you type `::Foo` you should only get results from the global namespace.

So something like:
```
if (Names.first.consume_front("::") || !Names.first.empty())
  Req.Scopes = {Names.first};
```
though more explicit handling of the cases may be clearer


================
Comment at: clangd/FindSymbols.cpp:141
+  Index->fuzzyFind(Req, [&TempSM, &Result, &Opts](const Symbol &Sym) {
+    auto &CD = Sym.Definition ? Sym.Definition : Sym.CanonicalDeclaration;
+    auto Uri = URI::parse(CD.FileURI);
----------------
Ugh, I never know what the right thing is here, decl/def split is a mess, and there's lots of different workflows.

I don't have any problem with the choice made here (personally I'd probably prefer canonical decl if we had working gotodefn), but it's subtle, so I think it should be a bit louder :-)
Maybe just a comment `// Prefer the definition over e.g. a function declaration in a header` - that's the most important case that differs.

If we see this decision popping up again maybe we should put it as a function on Symbol to call out/codify this decision?
e.g. `SymbolLocation Symbol::NavigationLocation() { return Definition ? Definition : CanonicalDeclaration; }`
but up to you' that's probably premature



================
Comment at: clangd/FindSymbols.cpp:156
+    }
+    auto L =
+        offsetRangeToLocation(TempSM, *Path, CD.StartOffset, CD.StartOffset);
----------------
nit: move into if?


================
Comment at: clangd/FindSymbols.cpp:159
+    if (L) {
+      SymbolKind SK =
+          adjustKindToCapability(indexSymbolKindToSymbolKind(Sym.SymInfo.Kind),
----------------
maybe add an assertion that the adjusted SK is supported? (This should be easier if the supportedSymbolKinds is always provided)


================
Comment at: clangd/FindSymbols.cpp:162
+                                 Opts.supportedSymbolKinds);
+      Result.push_back({Sym.Name, SK, *L, Sym.Scope});
+    }
----------------
The most obvious reading of the spec to me, and the one that matches the language servers I found, is that the containerName should be "foo::bar", not "foo::bar::"


================
Comment at: clangd/Protocol.cpp:194
+    return false;
+  O.map("valueSet", R.valueSet);
+  return true;
----------------
there's two right ways to do this which are conflated a bit here:
 - make the field of type T, give it a meaningful default value, and map it optionally (ignore return value of map())
 - make the field of type Optional<T>, let the default be none, and require map() to succeed (which it will if the field is missing)

So I think you want `O && O.map("valueSet", R.valueSet)`, and below. This has the advantage that if valueSet is present in the message but has an invalid value, parsing will fail.

(Yes, this is confusing and not at all ideal, sorry...)


================
Comment at: clangd/SourceCode.cpp:81
 
+llvm::Optional<Location>
+sourceRangeToLocation(const SourceManager &SourceMgr,
----------------
This function seems to be biting off more than it can chew - it's really hard to abstract this away and its callsites don't seem to fit the same pattern.

I'd probably suggest moving it back and having FindSymbols do just the part it needs locally.

First, SourceRange doesn't have a consistent set of semantics stronger than "a pair of sourcelocations". The first element always indicates the first point in the range, but the second element can be any of:
 - the first character outside the range (you're assuming this, which I think is actually the rarest in clang)
 - the last character inside the range
 - the first character of the last token inside the range (this is the least useful and I think most common!)

Secondly, there's several ways to interpret a location and the choice is somewhat arbitrary here. (in fact the choice isn't consistent even within the function - you may take the spelling location sometimes and the expansion location other times).

Third, the failure mechanism is opaque - this might fail and we don't describe how, so callsites are going to depend on the implementation details, and possibly end up doing redundant fallbacks.

Maybe most importantly, in the long run this is doing complicated and redundant work to push workspaceSymbols through this function:
 - constructing a path from SourceLocation is useful to xrefs, but documentHighlights had the path in the first place!
 - converting offset->row/column for workspaceSymbols is more simply done without SourceManager (VFS + offsetToPosition) and will go away entirely when we fix the index.


================
Comment at: clangd/index/SymbolCollector.cpp:294
+  assert(!StringRef(QName).startswith("::") &&
+         "Qualified names should not start with '::' here.");
   std::tie(S.Scope, S.Name) = splitQualifiedName(QName);
----------------
the message just repeats the assert expression, remove?
(I think this would be slightly clearer if it was just below OS.flush)


================
Comment at: clangd/tool/ClangdMain.cpp:105
 
+static llvm::cl::opt<int> LimitWorkspaceSymbolResult(
+    "workspace-symbol-limit",
----------------
malaperle wrote:
> MaskRay wrote:
> > malaperle wrote:
> > > sammccall wrote:
> > > > the -completion-limit was mostly to control rollout, I'm not sure this needs to be a flag. If it does, can we make it the same flag as completions (and call it -limit or so?)
> > > I think it's quite similar to "completions", when you type just one letter for example, you can get a lot of results and a lot of JSON output. So it feels like the flag could apply to both completions and workspace symbols. How about -limit-resuts? I think just -limit might be a bit too general as we might want to limit other things.
> > Can these options be set by LSP initialization options?
> They could. Are you say they *should*? We could add it in DidChangeConfigurationParams/ClangdConfigurationParamsChange (workspace/didChangeConfiguration) if we need to. I haven't tried or needed to add it on the client side though. It's not 100% clear what should go in workspace/didChangeConfiguration but I think it should probably things that the user would change more often at run-time. I'm not sure how frequent this "limit" will be changed by the user.
Static configuration that's available at construction time is much easier to work with, it follows the lifetime of the C++ objects. The LSP-initialization is an annoying fact of life, but not too bad since no "real work" needs to happen first.

If a parameter actually needs to be dynamic, we should strive to specify it at request-time, that's also a simple logical model. If we're going to have to extend LSP for this stuff anyway, we might as well extend the requests.

The `didChangeConfiguration` is very difficult to work with in my view, and we should avoid adding more options there at all possible.

Aside: several times I've attempted to clean up the GlobalCompilationDatabase and run up against this requirement that breaks most attempts at abstraction. (I'm curious why just restarting clangd when the compilation database is overridden doesn't work). At some point as we add the abstractions needed to track changes on disk and dependent rebuilds, we'll get a chance to revisit how this works.


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:20
+
+void PrintTo(const SymbolInformation &I, std::ostream *O) {
+  llvm::raw_os_ostream OS(*O);
----------------
nit: please move to operator<< declared in protocol.h


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:22
+  llvm::raw_os_ostream OS(*O);
+  OS << I.containerName << I.name << " - " << toJSON(I);
+}
----------------
(if stripping :: from containername, don't forget to update here)


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:47
+protected:
+  std::unique_ptr<MockFSProvider> FSProvider;
+  std::unique_ptr<MockCompilationDatabase> CDB;
----------------
(nit: why are all these on the heap?)


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:53
+
+  virtual void SetUp() override {
+    FSProvider = llvm::make_unique<MockFSProvider>();
----------------
nit: it's rare to need SetUp, the constructor is fine (and consistent with other tests)


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:95
+      })cpp");
+  EXPECT_THAT(getSymbols("local_var"), IsEmpty());
+  EXPECT_THAT(getSymbols("LocalClass"), IsEmpty());
----------------
these negative tests spell the exact name of the symbol and assert the results are empty.
They'll pass if the name is spelled/formatted wrong - maybe prefer getSymbols("l") or getSymbols("")?


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:99
+
+TEST_F(WorkspaceSymbolsTest, NoParams) {
+  addFile("foo.cpp", R"cpp(
----------------
combine with NoLocals?


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:120
+  EXPECT_THAT(getSymbols("method"), IsEmpty());
+  EXPECT_THAT(getSymbols("ClassWithMembers::method"), IsEmpty());
+  EXPECT_THAT(getSymbols("ClassWithMembers::met"), IsEmpty());
----------------
Can we hold off on these tests until the functionality is in? (though do include *something* in a namespace)


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:179
+
+TEST_F(WorkspaceSymbolsTest, AnonymousNamespace) {
+  addFile("foo.h", R"cpp(
----------------
maybe add more namespace cases to this:

namespace ans1 { int ai1; namespace ans2 { int ai2 } }

"a" -> ans1, ans1::i1, ans1::ans2, ans1::ans2::i2
"::a" -> ans1
"ans1::" -> ans1::ans2, ans1::ai1
"::ans1::ans2::" -> ans1::ans2::ai2


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:209
+
+TEST_F(WorkspaceSymbolsTest, GlobalNamespaceQueries) {
+  addFile("foo.h", R"cpp(
----------------
can you include a symbol inside a namespace, and verify it doesn't get returned? (or combine with other namespaces queries as suggested above)


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:245
+
+TEST_F(WorkspaceSymbolsTest, Enums) {
+  addFile("foo.h", R"cpp(
----------------
the enums/union/inlinenamespace cases seem to be basically testing symbolcollector again - the only differences here are on the indexing side. So I think they're ok to drop.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44882





More information about the cfe-commits mailing list