[PATCH] D44882: [clangd] Implementation of workspace/symbol request

Marc-Andre Laperle via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 4 12:57:53 PDT 2018


malaperle added inline comments.


================
Comment at: clangd/ClangdLSPServer.cpp:99
       Params.capabilities.textDocument.completion.completionItem.snippetSupport;
+  if (Params.capabilities.workspace && Params.capabilities.workspace->symbol &&
+      Params.capabilities.workspace->symbol->symbolKind)
----------------
sammccall wrote:
> malaperle wrote:
> > sammccall wrote:
> > > This is the analogue to the one on `completion` that we currently ignore, and one on `symbol` corresponding to the `documentSymbol` call which isn't implemented yet.
> > > 
> > > I think the value of the extended types is low and it might be worth leaving out of the first version to keep complexity down.
> > > (If we really want it, the supporting code (mapping tables) should probably be in a place where it can be reused by `documentSymbol` and can live next to the equivalent for `completion`...)
> > I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it. So would it be OK to move to FindSymbols.cpp (the new common file for document/symbol and workspace/symbol)?
> > I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it.
> OK, fair enough.
> 
> I think what bothers me about this option is how deep it goes - this isn't really a request option in any useful sense.
> I'd suggest the "C++ layer" i.e. ClangdServer should always return full-fidelity results, and the "LSP layer" should deal with folding them.
> 
> In which case adjustKindToCapability should live somewhere like protocol.h and get called from this file - WDYT?
Sounds good, I moved it.


================
Comment at: clangd/ClangdLSPServer.cpp:258
+        if (!Items)
+          return replyError(ErrorCode::InvalidParams,
+                            llvm::toString(Items.takeError()));
----------------
hokein wrote:
> `InvalidParams` doesn't match all cases where internal errors occur. Maybe use `ErrorCode::InternalError`?
Sounds good.


================
Comment at: clangd/FindSymbols.cpp:28
+  // All clients should support those.
+  if (Kind >= SymbolKind::File && Kind <= SymbolKind::Array)
+    return Kind;
----------------
sammccall wrote:
> Can we rather have this condition checked when the client sets the supported symbols, and remove this special knowledge here?
"supportedSymbolKinds" is now populated with those base symbol kinds at initialization.


================
Comment at: clangd/FindSymbols.cpp:42
+
+  if (!supportedSymbolKinds) {
+    // Provide some sensible default when all fails.
----------------
sammccall wrote:
> This code shouldn't need to handle this case. The LSP specifies the default set of supported types if it's not provided, so ClangdLSPServer should enure we always have a valid set
My thought is that if we start dealing with one of those:
```
  Object = 19,
  Key = 20,
  Null = 21,
  Event = 24,
  Operator = 25,
  TypeParameter = 26
```
and it's an older client that only supports up to "Array = 18", then we can provide a fallback, otherwise the client my not handle the unknown kind gracefully. But we can add this later if necessary I guess.


================
Comment at: clangd/FindSymbols.cpp:114
+  std::vector<SymbolInformation> Result;
+  if (Query.empty() || !Index)
+    return Result;
----------------
sammccall wrote:
> why Query.empty()?
> In practice maybe showing results before you type is bad UX, but that seems up to the editor to decide.
vscode sends an empty query and it also doesn't show all symbols in Typescript for example, so that's why I thought that would be best.


================
Comment at: clangd/FindSymbols.cpp:130
+
+  // Global scope is represented by "" in FuzzyFind.
+  if (Names.first.startswith("::"))
----------------
sammccall wrote:
> sammccall wrote:
> > nit: this is `Names.first.consume_front("::")`
> > Might be clearer as "FuzzyFind doesn't want leading :: qualifier" to avoid implication, that global namespace is special, particularly because...
> actually the global namespace *is* special :-)
> 
> IMO If you type `Foo`, you should get results from any namespace, but if you type `::Foo` you should only get results from the global namespace.
> 
> So something like:
> ```
> if (Names.first.consume_front("::") || !Names.first.empty())
>   Req.Scopes = {Names.first};
> ```
> though more explicit handling of the cases may be clearer
Sounds good. I think this works well.


================
Comment at: clangd/FindSymbols.cpp:141
+  Index->fuzzyFind(Req, [&TempSM, &Result, &Opts](const Symbol &Sym) {
+    auto &CD = Sym.Definition ? Sym.Definition : Sym.CanonicalDeclaration;
+    auto Uri = URI::parse(CD.FileURI);
----------------
sammccall wrote:
> Ugh, I never know what the right thing is here, decl/def split is a mess, and there's lots of different workflows.
> 
> I don't have any problem with the choice made here (personally I'd probably prefer canonical decl if we had working gotodefn), but it's subtle, so I think it should be a bit louder :-)
> Maybe just a comment `// Prefer the definition over e.g. a function declaration in a header` - that's the most important case that differs.
> 
> If we see this decision popping up again maybe we should put it as a function on Symbol to call out/codify this decision?
> e.g. `SymbolLocation Symbol::NavigationLocation() { return Definition ? Definition : CanonicalDeclaration; }`
> but up to you' that's probably premature
> 
I think perhaps this could be a user option and making that decision (reading the option and returning the decl) would make sense in a function. Perhaps this can be added a bit later.


================
Comment at: clangd/FindSymbols.cpp:159
+    if (L) {
+      SymbolKind SK =
+          adjustKindToCapability(indexSymbolKindToSymbolKind(Sym.SymInfo.Kind),
----------------
sammccall wrote:
> maybe add an assertion that the adjusted SK is supported? (This should be easier if the supportedSymbolKinds is always provided)
I added an assertion in ClangdLSPServer, where adjustKindToCapability is now called.


================
Comment at: clangd/FindSymbols.cpp:162
+                                 Opts.supportedSymbolKinds);
+      Result.push_back({Sym.Name, SK, *L, Sym.Scope});
+    }
----------------
sammccall wrote:
> The most obvious reading of the spec to me, and the one that matches the language servers I found, is that the containerName should be "foo::bar", not "foo::bar::"
Sounds good. I had to create a temp std::string *and* StringRef, yuck.


================
Comment at: clangd/FindSymbols.cpp:145
+    if (!Uri) {
+      log(llvm::formatv(
+          "Workspace symbol: Could not parse URI '{0}' for symbol '{1}'.",
----------------
hokein wrote:
> I think we may want to report the error to client instead of just logging them.
I'm not sure, wouldn't that mean aborting the whole request? I think by logging only, if one symbol has a problem and not the others, the request could still provide a lot of good results.


================
Comment at: clangd/SourceCode.cpp:81
 
+llvm::Optional<Location>
+sourceRangeToLocation(const SourceManager &SourceMgr,
----------------
sammccall wrote:
> This function seems to be biting off more than it can chew - it's really hard to abstract this away and its callsites don't seem to fit the same pattern.
> 
> I'd probably suggest moving it back and having FindSymbols do just the part it needs locally.
> 
> First, SourceRange doesn't have a consistent set of semantics stronger than "a pair of sourcelocations". The first element always indicates the first point in the range, but the second element can be any of:
>  - the first character outside the range (you're assuming this, which I think is actually the rarest in clang)
>  - the last character inside the range
>  - the first character of the last token inside the range (this is the least useful and I think most common!)
> 
> Secondly, there's several ways to interpret a location and the choice is somewhat arbitrary here. (in fact the choice isn't consistent even within the function - you may take the spelling location sometimes and the expansion location other times).
> 
> Third, the failure mechanism is opaque - this might fail and we don't describe how, so callsites are going to depend on the implementation details, and possibly end up doing redundant fallbacks.
> 
> Maybe most importantly, in the long run this is doing complicated and redundant work to push workspaceSymbols through this function:
>  - constructing a path from SourceLocation is useful to xrefs, but documentHighlights had the path in the first place!
>  - converting offset->row/column for workspaceSymbols is more simply done without SourceManager (VFS + offsetToPosition) and will go away entirely when we fix the index.
> This function seems to be biting off more than it can chew - it's really hard to abstract this away and its callsites don't seem to fit the same pattern.
> I'd probably suggest moving it back and having FindSymbols do just the part it needs locally.

Sure, I'll move it back/ make a local one. In the end the one in xrefs will be quite similar, especially once we add the proper "go to definition" (soon!) but maybe it's less confusing to duplicate some code here.

>you may take the spelling location sometimes and the expansion location other times

What do you mean here? I see spelling but not expansion, maybe I missed it.

> converting offset->row/column for workspaceSymbols is more simply done without SourceManager (VFS + offsetToPosition) and will go away entirely when we fix the index.
> 

More simply but I'm not 100% sure it's better. I would be good to somehow reuse the logic in SourceManager's ComputeLineNumbers as it seems to be faster (SSE?) and handles more cases (\r?).
If I were to change the code to use VFS, how would I go about it? I would use FileSystem::getBufferForFile but to prevent reloading the same buffer over and over, then I would need to keep a map of filenames -> buffers, which pretty much FileManager/SourceManager provides already.


================
Comment at: clangd/SourceCode.h:59
+/// Turn a pair of offsets into a Location.
+llvm::Optional<Location>
+offsetRangeToLocation(const DraftStore &DS, SourceManager &SourceMgr,
----------------
hokein wrote:
> We should use `llvm::Expected`?
> 
> The function needs a comment documenting its behavior, especially on the unsaved file content. 
I removed this one.


================
Comment at: clangd/SourceCode.h:61
+offsetRangeToLocation(const DraftStore &DS, SourceManager &SourceMgr,
+                      StringRef File, size_t OffsetStart, size_t OffsetEnd);
+
----------------
hokein wrote:
> nit: name `FilePath` or `FileName`, `File` seems to be a bit confusing, does it mean `FileContent` or `FileName`?
removed


================
Comment at: clangd/index/SymbolCollector.cpp:294
+  assert(!StringRef(QName).startswith("::") &&
+         "Qualified names should not start with '::' here.");
   std::tie(S.Scope, S.Name) = splitQualifiedName(QName);
----------------
sammccall wrote:
> the message just repeats the assert expression, remove?
> (I think this would be slightly clearer if it was just below OS.flush)
The assert was just moved from splitQualifiedName but I can remove the message and move it.


================
Comment at: clangd/tool/ClangdMain.cpp:105
 
+static llvm::cl::opt<int> LimitWorkspaceSymbolResult(
+    "workspace-symbol-limit",
----------------
sammccall wrote:
> malaperle wrote:
> > MaskRay wrote:
> > > malaperle wrote:
> > > > sammccall wrote:
> > > > > the -completion-limit was mostly to control rollout, I'm not sure this needs to be a flag. If it does, can we make it the same flag as completions (and call it -limit or so?)
> > > > I think it's quite similar to "completions", when you type just one letter for example, you can get a lot of results and a lot of JSON output. So it feels like the flag could apply to both completions and workspace symbols. How about -limit-resuts? I think just -limit might be a bit too general as we might want to limit other things.
> > > Can these options be set by LSP initialization options?
> > They could. Are you say they *should*? We could add it in DidChangeConfigurationParams/ClangdConfigurationParamsChange (workspace/didChangeConfiguration) if we need to. I haven't tried or needed to add it on the client side though. It's not 100% clear what should go in workspace/didChangeConfiguration but I think it should probably things that the user would change more often at run-time. I'm not sure how frequent this "limit" will be changed by the user.
> Static configuration that's available at construction time is much easier to work with, it follows the lifetime of the C++ objects. The LSP-initialization is an annoying fact of life, but not too bad since no "real work" needs to happen first.
> 
> If a parameter actually needs to be dynamic, we should strive to specify it at request-time, that's also a simple logical model. If we're going to have to extend LSP for this stuff anyway, we might as well extend the requests.
> 
> The `didChangeConfiguration` is very difficult to work with in my view, and we should avoid adding more options there at all possible.
> 
> Aside: several times I've attempted to clean up the GlobalCompilationDatabase and run up against this requirement that breaks most attempts at abstraction. (I'm curious why just restarting clangd when the compilation database is overridden doesn't work). At some point as we add the abstractions needed to track changes on disk and dependent rebuilds, we'll get a chance to revisit how this works.
> (I'm curious why just restarting clangd when the compilation database is overridden doesn't work).

Restarting Clangd would mean that the client has the burden to know that it has to restart Clangd and "reopen" all the files, which I don't think is specified in the LSP. I think the client would be free to just not restart a "crashed" server. VSCode does restart a crashing server (up to 5 times?) but that's arbitrary I think.
We also want to soon look at handling the CDB changing through the "workspace/didChangeWatchedFiles" and track which flags for which files actually changes, which means some files would be reparsed and others not.


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:47
+protected:
+  std::unique_ptr<MockFSProvider> FSProvider;
+  std::unique_ptr<MockCompilationDatabase> CDB;
----------------
sammccall wrote:
> (nit: why are all these on the heap?)
I needed to set ServerOpts.BuildDynamicSymbolIndex = true; before creating the ClangServer. But I extracted this to a test-file-specific optsForTest() and used that instead. Much nicer.


================
Comment at: unittests/clangd/FindSymbolsTests.cpp:245
+
+TEST_F(WorkspaceSymbolsTest, Enums) {
+  addFile("foo.h", R"cpp(
----------------
sammccall wrote:
> the enums/union/inlinenamespace cases seem to be basically testing symbolcollector again - the only differences here are on the indexing side. So I think they're ok to drop.
My reasoning was that this was also making sure that getWorkspaceSymbols worked for those cases, regardless of how it was implemented, i.e., being dependent on SymbolCollector. It would be possible that getWorkspaceSymbols breaks one of these cases, although unlikely given its *current* implementation. I think a good blend of tests is both some tests that take into consideration the implementation and also tests that do not consider the implementation but rather expected functionality. I guess in that sense, I was trying to add a bit more black box testing of functionality.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44882





More information about the cfe-commits mailing list