[PATCH] D44882: [clangd] Implementation of workspace/symbol request

Tue Mar 27 12:02:15 PDT 2018

malaperle marked 9 inline comments as done.
malaperle added a comment.

In https://reviews.llvm.org/D44882#1048355, @sammccall wrote:

> Can we split this patch up (it's pretty large anyway) and land the `workspaceSymbols` implementation first, without changing the indexer behavior?
>
> I think the indexer changes are in the right direction, but I think "ForCompletion" isn't quite enough metadata, and don't want to block everything on that.

Sounds good. ForCompletion was more of a short term hack in my mind, we can try to split it and make proper metadata.

>> We have a lot more features in mind that will make the index much bigger, like adding all occurrences (maybe not in the current YAML though).
> 
> (tangential)
>  FWIW, I think *that* might be a case where we want quite a different API - I think this is a file-major index operation (handful of results per file) rather than a symbol-major one (results per symbol is essentially unbounded, e.g. `std::string`). At the scale of google's internal codebase, this means we need a different backend, and this may be the case for large open-source codebases too (I know chromium hits scaling issues with indexers they try). 
>  There are also other operations where results are files rather than symbols: which files include file x, etc.

Yeah, there needs to be an API to fetch things per file, such as the dependencies and the occurrences it contains (for updating when the file is deleted, etc). I'm not sure there needs to be an API that queries a file for a given USR though, I think that could be transparent to the caller and an implementation detail. The results could be grouped by file though. Good discussion for future patches :)

>> But having options to control the amount of information indexed sounds good to me as there can be a lot of different project sizes and there can be different tradeoffs. I had in mind to an option for including/excluding the occurrences because without them, you lose workspace/references, call hierarchy, etc, but you still have code completion and workspace/symbol, document/symbol, etc while making the index much smaller. But more options sounds good.
> 
> If we add options, someone has to actually set them.
>  Anything that's too big for google's internal index or similar can be filtered out in the wrapping code (mapreduce) if the indexer exposes enough information to do so.
>  We shouldn't have problems with the in-memory dynamic index.
>  Features that are off in the default static indexer configuration won't end up getting used much.

I think everything should be ON by default but we can run into issues that the user wants to reduce index size and help indexing speed. We had such options in CDT since there are quite a few things that can make it big (macro expansion steps, all references, etc). But this can be added as needed later.

> So I can see a use for filters where the command-line indexer (YAML) would run too slow otherwise, but a) let's cross that bridge when we come to it

Sounds good.

In https://reviews.llvm.org/D44882#1049007, @ilya-biryukov wrote:

> >>> - Current `fuzzyFind` implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature?  (not in this patch, rather in the long term).
> >> 
> >> I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.
> > 
> > +1, I think experience with `workspaceSymbols` will help us answer this question.
>
> I was using an IDE that had fuzzy find for an equivalent of `workspaceSymbols` and found it to be an amazing experience. And having consistent behavior between different features is really nice.
>  Good ranking is a key to it being useful, though. If when typing `Draft` you get `DocumentRangeFormattingParams` ranked higher than `DraftMgr` that's a bug in FuzzyMatcher. If you have some not-very-nice results at the end of the list, this shouldn't be a problem in most cases.
>
> I'm highly in favor of enabling fuzzy matching for `workspaceSymbols`.

I think it will be fine once we address the bug(s). As long as the exact matches are always first then there's no problem.

================
Comment at: clangd/ClangdLSPServer.cpp:99
       Params.capabilities.textDocument.completion.completionItem.snippetSupport;
+  if (Params.capabilities.workspace && Params.capabilities.workspace->symbol &&
+      Params.capabilities.workspace->symbol->symbolKind)
----------------
sammccall wrote:
> This is the analogue to the one on `completion` that we currently ignore, and one on `symbol` corresponding to the `documentSymbol` call which isn't implemented yet.
> 
> I think the value of the extended types is low and it might be worth leaving out of the first version to keep complexity down.
> (If we really want it, the supporting code (mapping tables) should probably be in a place where it can be reused by `documentSymbol` and can live next to the equivalent for `completion`...)
I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it. So would it be OK to move to FindSymbols.cpp (the new common file for document/symbol and workspace/symbol)?

================
Comment at: clangd/ClangdLSPServer.h:96
+  /// Indicates whether or not the index is available.
+  bool SymbolIndexAvailable = false;

----------------
sammccall wrote:
> actually, I'm not sure this is worth tracking at all.
> Dynamic index is on by default, and the worst that can happen is some dynamically-configured clients will show a menu option in a non-default configuration, but it won't actually work.
> I don't think fixing this is worth the layering violation
I wonder if the "enable-index-based" is even necessary...

================
Comment at: clangd/WorkspaceSymbols.cpp:1
+//===--- WorkspaceSymbols.cpp ------------------------------------*- C++-*-===//
+//
----------------
sammccall wrote:
> as noted elsewhere, adjustKindToCapability etc need to be sharable, so I'm not sure this file structure quite works - it's named too narrowly to have any shared functionality, but if you pull those functions out there's only one left.
> 
> Maybe we could rename this FindSymbols.cpp, and then documentSymbols will also live here?
FindSymbols.cpp sounds good, I was thinking of moving things around once I introduce document/symbols but we might as well name this appropriately in the first place.

================
Comment at: clangd/WorkspaceSymbols.cpp:165
+            [](const SymbolInformation &A, const SymbolInformation &B) {
+              return A.name < B.name;
+            });
----------------
sammccall wrote:
> malaperle wrote:
> > ilya-biryukov wrote:
> > > We have `FuzzyMatcher`, it can produce decent match scores and is already used in completion.
> > > Any reason not to use it here?
> > I think it was giving odd ordering but let me try this again and at least document the reason.
> I think it's going to be worth pulling out a scoring concept/function here, we can't reuse the priority from the sema codecompletion, but the server's going to provide useful scoring signals.
> I'm not sure how much we want to keep this in sync with code completion overall, though.
I think I'm going to remove the sort here and try to investigate the FuzzyMatcher separately. It seems to work correctly for code completion but for workspace symbols there are many more results and it looks like the scoring needs to be tweaked.

================
Comment at: clangd/WorkspaceSymbols.h:37
+                    const SymbolIndex *const Index,
+                    llvm::IntrusiveRefCntPtr<vfs::FileSystem> VFS);
+
----------------
sammccall wrote:
> so reading all the files we return from the index is obviously going to be bad, we screwed up the interfaces :-)
> 
> From discussing with people who've dealt with similar things, we're probably going to need to store both row/col and offset in the index. I'll try to fix this soon...
> What you're doing for now is fine, but should probably have loud fixmes on it.
I added a FIXME in the implementation. You should probably chime in https://reviews.llvm.org/D39050, about storing more things in the index to not read all files :-)

================
Comment at: clangd/index/Index.h:143
+  // Whether or not the symbol should be considered for completion.
+  bool ForCompletion = false;
+
----------------
sammccall wrote:
> This strikes me as conflating a bunch of ideas that are independently important (e.g. is the symbol top-level, is it visible outside the declaring file). We should design this carefully after considering LSP features that might use the index.
Agreed. I was hesitant between adding a bunch of additional information/fields or adding just one for simplicity and revisit when we need more features. But it would be best if this could be well defined from the get go and was hoping for some suggestions on how to go about this. Doing this in a separate patch makes a lot of sense to me.

================
Comment at: clangd/tool/ClangdMain.cpp:105

+static llvm::cl::opt<int> LimitWorkspaceSymbolResult(
+    "workspace-symbol-limit",
----------------
sammccall wrote:
> the -completion-limit was mostly to control rollout, I'm not sure this needs to be a flag. If it does, can we make it the same flag as completions (and call it -limit or so?)
I think it's quite similar to "completions", when you type just one letter for example, you can get a lot of results and a lot of JSON output. So it feels like the flag could apply to both completions and workspace symbols. How about -limit-resuts? I think just -limit might be a bit too general as we might want to limit other things.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44882